Overview

Dataset statistics

Number of variables22
Number of observations43097
Missing cells82769
Missing cells (%)8.7%
Duplicate rows53
Duplicate rows (%)0.1%
Total size in memory7.2 MiB
Average record size in memory176.0 B

Variable types

Categorical12
Numeric9
DateTime1

Alerts

Dataset has 53 (0.1%) duplicate rowsDuplicates
belongs_to_collection has a high cardinality: 1685 distinct valuesHigh cardinality
genres has a high cardinality: 4005 distinct valuesHigh cardinality
original_language has a high cardinality: 88 distinct valuesHigh cardinality
overview has a high cardinality: 41966 distinct valuesHigh cardinality
production_companies has a high cardinality: 22367 distinct valuesHigh cardinality
production_countries has a high cardinality: 2311 distinct valuesHigh cardinality
spoken_languages has a high cardinality: 1793 distinct valuesHigh cardinality
tagline has a high cardinality: 19884 distinct valuesHigh cardinality
title has a high cardinality: 39995 distinct valuesHigh cardinality
ActorNames has a high cardinality: 42656 distinct valuesHigh cardinality
DirectorNames has a high cardinality: 17738 distinct valuesHigh cardinality
budget is highly overall correlated with revenue and 1 other fieldsHigh correlation
popularity is highly overall correlated with vote_countHigh correlation
revenue is highly overall correlated with budget and 2 other fieldsHigh correlation
vote_count is highly overall correlated with popularity and 1 other fieldsHigh correlation
return is highly overall correlated with budget and 1 other fieldsHigh correlation
original_language is highly imbalanced (66.9%)Imbalance
production_countries is highly imbalanced (57.7%)Imbalance
spoken_languages is highly imbalanced (62.1%)Imbalance
status is highly imbalanced (97.0%)Imbalance
belongs_to_collection has 38655 (89.7%) missing valuesMissing
genres has 1706 (4.0%) missing valuesMissing
overview has 873 (2.0%) missing valuesMissing
production_companies has 9985 (23.2%) missing valuesMissing
production_countries has 4914 (11.4%) missing valuesMissing
spoken_languages has 2848 (6.6%) missing valuesMissing
tagline has 23063 (53.5%) missing valuesMissing
DirectorNames has 444 (1.0%) missing valuesMissing
popularity is highly skewed (γ1 = 28.86291848)Skewed
return is highly skewed (γ1 = 134.8106964)Skewed
overview is uniformly distributedUniform
tagline is uniformly distributedUniform
title is uniformly distributedUniform
ActorNames is uniformly distributedUniform
budget has 34261 (79.5%) zerosZeros
revenue has 35703 (82.8%) zerosZeros
runtime has 1276 (3.0%) zerosZeros
vote_average has 2369 (5.5%) zerosZeros
vote_count has 2277 (5.3%) zerosZeros
return has 37715 (87.5%) zerosZeros

Reproduction

Analysis started2023-06-08 13:59:32.965129
Analysis finished2023-06-08 14:00:19.516942
Duration46.55 seconds
Software versionpandas-profiling v3.6.6
Download configurationconfig.json

Variables

belongs_to_collection
Categorical

HIGH CARDINALITY  MISSING 

Distinct1685
Distinct (%)37.9%
Missing38655
Missing (%)89.7%
Memory size336.8 KiB
The Bowery Boys
 
29
Totò Collection
 
27
James Bond Collection
 
26
Zatôichi: The Blind Swordsman
 
26
The Carry On Collection
 
25
Other values (1680)
4309 

Length

Max length54
Median length43
Mean length23.897344
Min length3

Characters and Unicode

Total characters106152
Distinct characters166
Distinct categories12 ?
Distinct scripts7 ?
Distinct blocks8 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique395 ?
Unique (%)8.9%

Sample

1st rowToy Story Collection
2nd rowGrumpy Old Men Collection
3rd rowFather of the Bride Collection
4th rowJames Bond Collection
5th rowBalto Collection

Common Values

ValueCountFrequency (%)
The Bowery Boys 29
 
0.1%
Totò Collection 27
 
0.1%
James Bond Collection 26
 
0.1%
Zatôichi: The Blind Swordsman 26
 
0.1%
The Carry On Collection 25
 
0.1%
Pokémon Collection 23
 
0.1%
Charlie Chan (Sidney Toler) Collection 21
 
< 0.1%
Godzilla (Showa) Collection 16
 
< 0.1%
Charlie Chan (Warner Oland) Collection 15
 
< 0.1%
Dragon Ball Z (Movie) Collection 15
 
< 0.1%
Other values (1675) 4219
 
9.8%
(Missing) 38655
89.7%

Length

2023-06-08T14:00:19.726428image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
collection 3714
25.3%
the 1134
 
7.7%
of 228
 
1.6%
series 141
 
1.0%
139
 
0.9%
trilogy 85
 
0.6%
and 84
 
0.6%
a 62
 
0.4%
man 62
 
0.4%
in 56
 
0.4%
Other values (2394) 8953
61.1%

Most occurring characters

ValueCountFrequency (%)
o 11023
 
10.4%
e 10349
 
9.7%
10217
 
9.6%
l 10135
 
9.5%
i 7486
 
7.1%
n 7335
 
6.9%
t 6435
 
6.1%
c 4803
 
4.5%
C 4437
 
4.2%
a 4420
 
4.2%
Other values (156) 29512
27.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 80401
75.7%
Uppercase Letter 13766
 
13.0%
Space Separator 10217
 
9.6%
Other Punctuation 574
 
0.5%
Close Punctuation 332
 
0.3%
Open Punctuation 332
 
0.3%
Decimal Number 317
 
0.3%
Dash Punctuation 162
 
0.2%
Other Letter 37
 
< 0.1%
Final Punctuation 9
 
< 0.1%
Other values (2) 5
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 11023
13.7%
e 10349
12.9%
l 10135
12.6%
i 7486
9.3%
n 7335
9.1%
t 6435
8.0%
c 4803
 
6.0%
a 4420
 
5.5%
r 3836
 
4.8%
s 2559
 
3.2%
Other values (69) 12020
15.0%
Uppercase Letter
ValueCountFrequency (%)
C 4437
32.2%
T 1511
 
11.0%
S 1051
 
7.6%
B 678
 
4.9%
M 624
 
4.5%
A 501
 
3.6%
D 498
 
3.6%
H 459
 
3.3%
P 428
 
3.1%
G 416
 
3.0%
Other values (33) 3163
23.0%
Other Letter
ValueCountFrequency (%)
3
 
8.1%
3
 
8.1%
3
 
8.1%
3
 
8.1%
3
 
8.1%
3
 
8.1%
3
 
8.1%
3
 
8.1%
3
 
8.1%
2
 
5.4%
Other values (4) 8
21.6%
Other Punctuation
ValueCountFrequency (%)
. 172
30.0%
' 107
18.6%
: 99
17.2%
, 79
13.8%
& 52
 
9.1%
! 34
 
5.9%
/ 21
 
3.7%
? 4
 
0.7%
3
 
0.5%
* 3
 
0.5%
Decimal Number
ValueCountFrequency (%)
1 80
25.2%
9 64
20.2%
3 52
16.4%
0 51
16.1%
2 19
 
6.0%
8 13
 
4.1%
5 12
 
3.8%
7 11
 
3.5%
6 10
 
3.2%
4 5
 
1.6%
Close Punctuation
ValueCountFrequency (%)
) 327
98.5%
] 5
 
1.5%
Open Punctuation
ValueCountFrequency (%)
( 327
98.5%
[ 5
 
1.5%
Dash Punctuation
ValueCountFrequency (%)
- 160
98.8%
2
 
1.2%
Space Separator
ValueCountFrequency (%)
10217
100.0%
Final Punctuation
ValueCountFrequency (%)
9
100.0%
Modifier Letter
ValueCountFrequency (%)
3
100.0%
Other Number
ValueCountFrequency (%)
½ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 93753
88.3%
Common 11948
 
11.3%
Cyrillic 414
 
0.4%
Hiragana 15
 
< 0.1%
Hangul 10
 
< 0.1%
Katakana 9
 
< 0.1%
Han 3
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 11023
11.8%
e 10349
11.0%
l 10135
10.8%
i 7486
 
8.0%
n 7335
 
7.8%
t 6435
 
6.9%
c 4803
 
5.1%
C 4437
 
4.7%
a 4420
 
4.7%
r 3836
 
4.1%
Other values (70) 23494
25.1%
Cyrillic
ValueCountFrequency (%)
л 48
 
11.6%
и 41
 
9.9%
о 37
 
8.9%
к 30
 
7.2%
е 27
 
6.5%
я 25
 
6.0%
а 17
 
4.1%
К 16
 
3.9%
ц 16
 
3.9%
р 14
 
3.4%
Other values (32) 143
34.5%
Common
ValueCountFrequency (%)
10217
85.5%
) 327
 
2.7%
( 327
 
2.7%
. 172
 
1.4%
- 160
 
1.3%
' 107
 
0.9%
: 99
 
0.8%
1 80
 
0.7%
, 79
 
0.7%
9 64
 
0.5%
Other values (20) 316
 
2.6%
Hiragana
ValueCountFrequency (%)
3
20.0%
3
20.0%
3
20.0%
3
20.0%
3
20.0%
Hangul
ValueCountFrequency (%)
2
20.0%
2
20.0%
2
20.0%
2
20.0%
2
20.0%
Katakana
ValueCountFrequency (%)
3
33.3%
3
33.3%
3
33.3%
Han
ValueCountFrequency (%)
3
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 105436
99.3%
Cyrillic 414
 
0.4%
None 248
 
0.2%
Hiragana 15
 
< 0.1%
Punctuation 14
 
< 0.1%
Katakana 12
 
< 0.1%
Hangul 10
 
< 0.1%
CJK 3
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 11023
 
10.5%
e 10349
 
9.8%
10217
 
9.7%
l 10135
 
9.6%
i 7486
 
7.1%
n 7335
 
7.0%
t 6435
 
6.1%
c 4803
 
4.6%
C 4437
 
4.2%
a 4420
 
4.2%
Other values (67) 28796
27.3%
Cyrillic
ValueCountFrequency (%)
л 48
 
11.6%
и 41
 
9.9%
о 37
 
8.9%
к 30
 
7.2%
е 27
 
6.5%
я 25
 
6.0%
а 17
 
4.1%
К 16
 
3.9%
ц 16
 
3.9%
р 14
 
3.4%
Other values (32) 143
34.5%
None
ValueCountFrequency (%)
é 46
18.5%
ä 41
16.5%
ô 35
14.1%
ò 28
11.3%
ö 19
7.7%
ó 14
 
5.6%
ı 14
 
5.6%
í 9
 
3.6%
á 4
 
1.6%
İ 4
 
1.6%
Other values (19) 34
13.7%
Punctuation
ValueCountFrequency (%)
9
64.3%
3
 
21.4%
2
 
14.3%
Katakana
ValueCountFrequency (%)
3
25.0%
3
25.0%
3
25.0%
3
25.0%
Hiragana
ValueCountFrequency (%)
3
20.0%
3
20.0%
3
20.0%
3
20.0%
3
20.0%
CJK
ValueCountFrequency (%)
3
100.0%
Hangul
ValueCountFrequency (%)
2
20.0%
2
20.0%
2
20.0%
2
20.0%
2
20.0%

budget
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct1208
Distinct (%)2.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4460174.4
Minimum0
Maximum3.8 × 108
Zeros34261
Zeros (%)79.5%
Negative0
Negative (%)0.0%
Memory size336.8 KiB
2023-06-08T14:00:20.014750image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile26000000
Maximum3.8 × 108
Range3.8 × 108
Interquartile range (IQR)0

Descriptive statistics

Standard deviation17870075
Coefficient of variation (CV)4.0065865
Kurtosis63.154787
Mean4460174.4
Median Absolute Deviation (MAD)0
Skewness6.929464
Sum1.9222014 × 1011
Variance3.1933957 × 1014
MonotonicityNot monotonic
2023-06-08T14:00:20.308009image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 34261
79.5%
5000000 285
 
0.7%
10000000 261
 
0.6%
20000000 242
 
0.6%
2000000 242
 
0.6%
15000000 226
 
0.5%
3000000 222
 
0.5%
25000000 206
 
0.5%
1000000 196
 
0.5%
30000000 192
 
0.4%
Other values (1198) 6764
 
15.7%
ValueCountFrequency (%)
0 34261
79.5%
1 24
 
0.1%
2 13
 
< 0.1%
3 9
 
< 0.1%
4 10
 
< 0.1%
5 7
 
< 0.1%
6 5
 
< 0.1%
7 4
 
< 0.1%
8 5
 
< 0.1%
9 1
 
< 0.1%
ValueCountFrequency (%)
380000000 1
 
< 0.1%
300000000 1
 
< 0.1%
280000000 1
 
< 0.1%
270000000 1
 
< 0.1%
260000000 3
 
< 0.1%
258000000 1
 
< 0.1%
255000000 1
 
< 0.1%
250000000 10
< 0.1%
245000000 2
 
< 0.1%
237000000 1
 
< 0.1%

genres
Categorical

HIGH CARDINALITY  MISSING 

Distinct4005
Distinct (%)9.7%
Missing1706
Missing (%)4.0%
Memory size336.8 KiB
Drama
4945 
Comedy
3602 
Documentary
 
1773
Drama, Romance
 
1298
Comedy, Drama
 
1133
Other values (4000)
28640 

Length

Max length76
Median length64
Mean length16.587833
Min length3

Characters and Unicode

Total characters686587
Distinct characters30
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2327 ?
Unique (%)5.6%

Sample

1st rowAnimation, Comedy, Family
2nd rowAdventure, Fantasy, Family
3rd rowRomance, Comedy
4th rowComedy, Drama, Romance
5th rowComedy

Common Values

ValueCountFrequency (%)
Drama 4945
 
11.5%
Comedy 3602
 
8.4%
Documentary 1773
 
4.1%
Drama, Romance 1298
 
3.0%
Comedy, Drama 1133
 
2.6%
Horror 958
 
2.2%
Comedy, Romance 926
 
2.1%
Comedy, Drama, Romance 593
 
1.4%
Drama, Comedy 533
 
1.2%
Horror, Thriller 526
 
1.2%
Other values (3995) 25104
58.2%
(Missing) 1706
 
4.0%

Length

2023-06-08T14:00:20.654356image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
drama 20105
21.7%
comedy 13073
14.1%
thriller 7596
 
8.2%
romance 6707
 
7.2%
action 6570
 
7.1%
horror 4623
 
5.0%
crime 4285
 
4.6%
adventure 3476
 
3.8%
fiction 3004
 
3.2%
science 3004
 
3.2%
Other values (12) 20169
21.8%

Most occurring characters

ValueCountFrequency (%)
r 67365
 
9.8%
a 59910
 
8.7%
e 54209
 
7.9%
m 51344
 
7.5%
51221
 
7.5%
, 47457
 
6.9%
o 46706
 
6.8%
i 38760
 
5.6%
n 33755
 
4.9%
y 27059
 
3.9%
Other values (20) 208801
30.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 494537
72.0%
Uppercase Letter 93372
 
13.6%
Space Separator 51221
 
7.5%
Other Punctuation 47457
 
6.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r 67365
13.6%
a 59910
12.1%
e 54209
11.0%
m 51344
10.4%
o 46706
9.4%
i 38760
7.8%
n 33755
6.8%
y 27059
5.5%
c 26589
 
5.4%
t 24602
 
5.0%
Other values (7) 64238
13.0%
Uppercase Letter
ValueCountFrequency (%)
D 22866
24.5%
C 17358
18.6%
A 11744
12.6%
F 9520
10.2%
T 8356
 
8.9%
R 6707
 
7.2%
H 5968
 
6.4%
M 4752
 
5.1%
S 3004
 
3.2%
W 2337
 
2.5%
Space Separator
ValueCountFrequency (%)
51221
100.0%
Other Punctuation
ValueCountFrequency (%)
, 47457
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 587909
85.6%
Common 98678
 
14.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
r 67365
11.5%
a 59910
 
10.2%
e 54209
 
9.2%
m 51344
 
8.7%
o 46706
 
7.9%
i 38760
 
6.6%
n 33755
 
5.7%
y 27059
 
4.6%
c 26589
 
4.5%
t 24602
 
4.2%
Other values (18) 157610
26.8%
Common
ValueCountFrequency (%)
51221
51.9%
, 47457
48.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 686587
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
r 67365
 
9.8%
a 59910
 
8.7%
e 54209
 
7.9%
m 51344
 
7.5%
51221
 
7.5%
, 47457
 
6.9%
o 46706
 
6.8%
i 38760
 
5.6%
n 33755
 
4.9%
y 27059
 
3.9%
Other values (20) 208801
30.4%

id
Real number (ℝ)

Distinct42997
Distinct (%)99.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean104047.32
Minimum2
Maximum469172
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size336.8 KiB
2023-06-08T14:00:20.969836image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile4947.6
Q125195
median56402
Q3144271
95-th percentile354380.8
Maximum469172
Range469170
Interquartile range (IQR)119076

Descriptive statistics

Standard deviation110607.69
Coefficient of variation (CV)1.0630518
Kurtosis0.78297092
Mean104047.32
Median Absolute Deviation (MAD)41352
Skewness1.3586007
Sum4.4841274 × 109
Variance1.2234061 × 1010
MonotonicityNot monotonic
2023-06-08T14:00:21.274167image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
141971 9
 
< 0.1%
152795 4
 
< 0.1%
18440 4
 
< 0.1%
25541 4
 
< 0.1%
11115 4
 
< 0.1%
265189 4
 
< 0.1%
77221 4
 
< 0.1%
23305 4
 
< 0.1%
4912 4
 
< 0.1%
97995 4
 
< 0.1%
Other values (42987) 43052
99.9%
ValueCountFrequency (%)
2 1
< 0.1%
3 1
< 0.1%
5 1
< 0.1%
6 1
< 0.1%
11 1
< 0.1%
12 1
< 0.1%
13 1
< 0.1%
14 1
< 0.1%
15 1
< 0.1%
16 1
< 0.1%
ValueCountFrequency (%)
469172 1
< 0.1%
468707 1
< 0.1%
467731 1
< 0.1%
465044 1
< 0.1%
464207 1
< 0.1%
464111 1
< 0.1%
463906 1
< 0.1%
463800 1
< 0.1%
462788 1
< 0.1%
462108 1
< 0.1%

original_language
Categorical

HIGH CARDINALITY  IMBALANCE 

Distinct88
Distinct (%)0.2%
Missing7
Missing (%)< 0.1%
Memory size336.8 KiB
en
30354 
fr
 
2367
it
 
1510
ja
 
1313
de
 
1040
Other values (83)
6506 

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters86180
Distinct characters26
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique17 ?
Unique (%)< 0.1%

Sample

1st rowen
2nd rowen
3rd rowen
4th rowen
5th rowen

Common Values

ValueCountFrequency (%)
en 30354
70.4%
fr 2367
 
5.5%
it 1510
 
3.5%
ja 1313
 
3.0%
de 1040
 
2.4%
es 945
 
2.2%
ru 772
 
1.8%
hi 505
 
1.2%
ko 441
 
1.0%
zh 400
 
0.9%
Other values (78) 3443
 
8.0%

Length

2023-06-08T14:00:21.666101image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
en 30354
70.4%
fr 2367
 
5.5%
it 1510
 
3.5%
ja 1313
 
3.0%
de 1040
 
2.4%
es 945
 
2.2%
ru 772
 
1.8%
hi 505
 
1.2%
ko 441
 
1.0%
zh 400
 
0.9%
Other values (78) 3443
 
8.0%

Most occurring characters

ValueCountFrequency (%)
e 32583
37.8%
n 31041
36.0%
r 3499
 
4.1%
f 2735
 
3.2%
i 2340
 
2.7%
t 2210
 
2.6%
a 1792
 
2.1%
s 1584
 
1.8%
j 1314
 
1.5%
d 1284
 
1.5%
Other values (16) 5798
 
6.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 86180
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 32583
37.8%
n 31041
36.0%
r 3499
 
4.1%
f 2735
 
3.2%
i 2340
 
2.7%
t 2210
 
2.6%
a 1792
 
2.1%
s 1584
 
1.8%
j 1314
 
1.5%
d 1284
 
1.5%
Other values (16) 5798
 
6.7%

Most occurring scripts

ValueCountFrequency (%)
Latin 86180
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 32583
37.8%
n 31041
36.0%
r 3499
 
4.1%
f 2735
 
3.2%
i 2340
 
2.7%
t 2210
 
2.6%
a 1792
 
2.1%
s 1584
 
1.8%
j 1314
 
1.5%
d 1284
 
1.5%
Other values (16) 5798
 
6.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 86180
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 32583
37.8%
n 31041
36.0%
r 3499
 
4.1%
f 2735
 
3.2%
i 2340
 
2.7%
t 2210
 
2.6%
a 1792
 
2.1%
s 1584
 
1.8%
j 1314
 
1.5%
d 1284
 
1.5%
Other values (16) 5798
 
6.7%

overview
Categorical

HIGH CARDINALITY  MISSING  UNIFORM 

Distinct41966
Distinct (%)99.4%
Missing873
Missing (%)2.0%
Memory size336.8 KiB
No overview found.
 
129
Recovering from a nail gun shot to the head and 13 months of coma, doctor Pekka Valinta starts to unravel the mystery of his past, still suffering from total amnesia.
 
9
King Lear, old and tired, divides his kingdom among his daughters, giving great importance to their protestations of love for him. When Cordelia, youngest and most honest, refuses to idly flatter the old man in return for favor, he banishes her and turns for support to his remaining daughters. But Goneril and Regan have no love for him and instead plot to take all his power from him. In a parallel, Lear's loyal courtier Gloucester favors his illegitimate son Edmund after being told lies about his faithful son Edgar. Madness and tragedy befall both ill-starred fathers.
 
5
Former Danish servicemen Lars and Jimmy are thrown together while training in a neo-Nazi group. Moving from hostility through grudging admiration to friendship and finally passion, events take a darker turn when their illicit relationship is uncovered.
 
4
British nurse Catherine Barkley (Helen Hayes) and American Lieutenant Frederic Henry (Gary Cooper) fall in love during the First World War in Italy. Eventually separated by Frederic's transfer, tremendous challenges and difficult decisions face each, as the war rages on. Academy Awards winner for Best Cinematography and for Best Sound, Recording. Nominated for Best Picture and for Best Art Direction.
 
4
Other values (41961)
42073 

Length

Max length1000
Median length788
Mean length322.30736
Min length1

Characters and Unicode

Total characters13609106
Distinct characters416
Distinct categories25 ?
Distinct scripts13 ?
Distinct blocks21 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique41902 ?
Unique (%)99.2%

Sample

1st rowLed by Woody, Andy's toys live happily in his room until Andy's birthday brings Buzz Lightyear onto the scene. Afraid of losing his place in Andy's heart, Woody plots against Buzz. But when circumstances separate Buzz and Woody from their owner, the duo eventually learns to put aside their differences.
2nd rowWhen siblings Judy and Peter discover an enchanted board game that opens the door to a magical world, they unwittingly invite Alan -- an adult who's been trapped inside the game for 26 years -- into their living room. Alan's only hope for freedom is to finish the game, which proves risky as all three find themselves running from giant rhinoceroses, evil monkeys and other terrifying creatures.
3rd rowA family wedding reignites the ancient feud between next-door neighbors and fishing buddies John and Max. Meanwhile, a sultry Italian divorcée opens a restaurant at the local bait shop, alarming the locals who worry she'll scare the fish away. But she's less interested in seafood than she is in cooking up a hot time with Max.
4th rowCheated on, mistreated and stepped on, the women are holding their breath, waiting for the elusive "good man" to break a string of less-than-stellar lovers. Friends and confidants Vannah, Bernie, Glo and Robin talk it all out, determined to find a better way to breathe.
5th rowJust when George Banks has recovered from his daughter's wedding, he receives the news that she's pregnant ... and that George's wife, Nina, is expecting too. He was planning on selling their home, but that's a plan that -- like George -- will have to change with the arrival of both a grandchild and a kid of his own.

Common Values

ValueCountFrequency (%)
No overview found. 129
 
0.3%
Recovering from a nail gun shot to the head and 13 months of coma, doctor Pekka Valinta starts to unravel the mystery of his past, still suffering from total amnesia. 9
 
< 0.1%
King Lear, old and tired, divides his kingdom among his daughters, giving great importance to their protestations of love for him. When Cordelia, youngest and most honest, refuses to idly flatter the old man in return for favor, he banishes her and turns for support to his remaining daughters. But Goneril and Regan have no love for him and instead plot to take all his power from him. In a parallel, Lear's loyal courtier Gloucester favors his illegitimate son Edmund after being told lies about his faithful son Edgar. Madness and tragedy befall both ill-starred fathers. 5
 
< 0.1%
Former Danish servicemen Lars and Jimmy are thrown together while training in a neo-Nazi group. Moving from hostility through grudging admiration to friendship and finally passion, events take a darker turn when their illicit relationship is uncovered. 4
 
< 0.1%
British nurse Catherine Barkley (Helen Hayes) and American Lieutenant Frederic Henry (Gary Cooper) fall in love during the First World War in Italy. Eventually separated by Frederic's transfer, tremendous challenges and difficult decisions face each, as the war rages on. Academy Awards winner for Best Cinematography and for Best Sound, Recording. Nominated for Best Picture and for Best Art Direction. 4
 
< 0.1%
More than two decades after catapulting to stardom with The Princess Bride, an aging actress (Robin Wright, playing a version of herself) decides to take her final job: preserving her digital likeness for a future Hollywood. Through a deal brokered by her loyal, longtime agent and the head of Miramount Studios, her alias will be controlled by the studio, and will star in any film they want with no restrictions. In return, she receives healthy compensation so she can care for her ailing son and her digitized character will stay forever young. Twenty years later, under the creative vision of the studio’s head animator, Wright’s digital double rises to immortal stardom. With her contract expiring, she is invited to take part in “The Congress” convention as she makes her comeback straight into the world of future fantasy cinema. 4
 
< 0.1%
On the Arabian Peninsula in the 1930s, two warring leaders come face to face. The victorious Nesib, Emir of Hobeika, lays down his peace terms to rival Amar, Sultan of Salmaah. The two men agree that neither can lay claim to the area of no man’s land between them called The Yellow Belt. In return, Nesib adopts Amar’s two boys Saleeh and Auda as a guarantee against invasion. Twelve years later, Saleeh and Auda have grown into young men. Saleeh, the warrior, itches to escape his gilded cage and return to his father’s land. Auda cares only for books and the pursuit of knowledge. One day, their adopted father Nesib is visited by an American from Texas. He tells the Emir that his land is blessed with oil and promises him riches beyond his wildest imagination. Nesib imagines a realm of infinite possibility, a kingdom with roads, schools and hospitals all paid for by the black gold beneath the barren sand. There is only one problem. The precious oil is located in the Yellow Belt. 4
 
< 0.1%
In feudal India, a warrior (Khan) who renounces his role as the longtime enforcer to a local lord becomes the prey in a murderous hunt through the Himalayan mountains. 4
 
< 0.1%
Hitman Jef Costello is a perfectionist who always carefully plans his murders and who never gets caught. 4
 
< 0.1%
The third film of Frank Capra's 'Why We Fight" propaganda film series, dealing with the Nazi conquest of Western Europe in 1940. 4
 
< 0.1%
Other values (41956) 42053
97.6%
(Missing) 873
 
2.0%

Length

2023-06-08T14:00:22.171890image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
the 129541
 
5.6%
a 94653
 
4.1%
and 70958
 
3.1%
to 70485
 
3.0%
of 64743
 
2.8%
in 45535
 
2.0%
his 35171
 
1.5%
is 35055
 
1.5%
with 22880
 
1.0%
her 21060
 
0.9%
Other values (93219) 1733684
74.6%

Most occurring characters

ValueCountFrequency (%)
2283408
16.8%
e 1293273
 
9.5%
a 891306
 
6.5%
t 884105
 
6.5%
i 805386
 
5.9%
o 784660
 
5.8%
n 778611
 
5.7%
s 728670
 
5.4%
r 705631
 
5.2%
h 571750
 
4.2%
Other values (406) 3882306
28.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 10560913
77.6%
Space Separator 2283445
 
16.8%
Uppercase Letter 369441
 
2.7%
Other Punctuation 296844
 
2.2%
Decimal Number 38519
 
0.3%
Dash Punctuation 34749
 
0.3%
Close Punctuation 9745
 
0.1%
Open Punctuation 9727
 
0.1%
Final Punctuation 4166
 
< 0.1%
Initial Punctuation 772
 
< 0.1%
Other values (15) 785
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 1293273
12.2%
a 891306
 
8.4%
t 884105
 
8.4%
i 805386
 
7.6%
o 784660
 
7.4%
n 778611
 
7.4%
s 728670
 
6.9%
r 705631
 
6.7%
h 571750
 
5.4%
l 453602
 
4.3%
Other values (142) 2663919
25.2%
Uppercase Letter
ValueCountFrequency (%)
A 40406
 
10.9%
T 33684
 
9.1%
S 29400
 
8.0%
M 22724
 
6.2%
B 22592
 
6.1%
C 21616
 
5.9%
H 18569
 
5.0%
W 17762
 
4.8%
I 15669
 
4.2%
D 15409
 
4.2%
Other values (76) 131610
35.6%
Other Letter
ValueCountFrequency (%)
5
 
4.6%
5
 
4.6%
5
 
4.6%
3
 
2.8%
3
 
2.8%
3
 
2.8%
3
 
2.8%
2
 
1.8%
2
 
1.8%
2
 
1.8%
Other values (65) 76
69.7%
Other Punctuation
ValueCountFrequency (%)
, 126545
42.6%
. 118748
40.0%
' 29742
 
10.0%
" 10905
 
3.7%
: 3021
 
1.0%
? 2549
 
0.9%
; 2314
 
0.8%
! 1462
 
0.5%
/ 686
 
0.2%
& 423
 
0.1%
Other values (12) 449
 
0.2%
Nonspacing Mark
ValueCountFrequency (%)
ి 4
12.5%
́ 4
12.5%
3
9.4%
̈ 3
9.4%
3
9.4%
3
9.4%
2
6.2%
2
6.2%
2
6.2%
2
6.2%
Other values (4) 4
12.5%
Decimal Number
ValueCountFrequency (%)
1 8981
23.3%
0 7403
19.2%
9 5917
15.4%
2 3820
9.9%
5 2221
 
5.8%
8 2190
 
5.7%
3 2173
 
5.6%
4 1998
 
5.2%
7 1939
 
5.0%
6 1877
 
4.9%
Spacing Mark
ValueCountFrequency (%)
9
39.1%
3
 
13.0%
3
 
13.0%
3
 
13.0%
2
 
8.7%
ि 1
 
4.3%
1
 
4.3%
ி 1
 
4.3%
Dash Punctuation
ValueCountFrequency (%)
- 33347
96.0%
823
 
2.4%
570
 
1.6%
5
 
< 0.1%
4
 
< 0.1%
Other Symbol
ValueCountFrequency (%)
® 41
71.9%
12
 
21.1%
¦ 2
 
3.5%
° 1
 
1.8%
1
 
1.8%
Math Symbol
ValueCountFrequency (%)
~ 18
54.5%
+ 6
 
18.2%
= 6
 
18.2%
| 2
 
6.1%
1
 
3.0%
Open Punctuation
ValueCountFrequency (%)
( 9676
99.5%
[ 48
 
0.5%
{ 2
 
< 0.1%
1
 
< 0.1%
Currency Symbol
ValueCountFrequency (%)
$ 302
96.5%
£ 9
 
2.9%
1
 
0.3%
1
 
0.3%
Space Separator
ValueCountFrequency (%)
2283408
> 99.9%
  35
 
< 0.1%
  2
 
< 0.1%
Close Punctuation
ValueCountFrequency (%)
) 9695
99.5%
] 48
 
0.5%
} 2
 
< 0.1%
Final Punctuation
ValueCountFrequency (%)
3549
85.2%
598
 
14.4%
» 19
 
0.5%
Initial Punctuation
ValueCountFrequency (%)
577
74.7%
177
 
22.9%
« 18
 
2.3%
Modifier Symbol
ValueCountFrequency (%)
´ 25
67.6%
` 11
29.7%
¯ 1
 
2.7%
Control
ValueCountFrequency (%)
83
96.5%
’ 3
 
3.5%
Format
ValueCountFrequency (%)
29
59.2%
­ 20
40.8%
Other Number
ValueCountFrequency (%)
¹ 8
50.0%
½ 8
50.0%
Connector Punctuation
ValueCountFrequency (%)
_ 17
100.0%
Line Separator
ValueCountFrequency (%)
7
100.0%
Paragraph Separator
ValueCountFrequency (%)
2
100.0%
Letter Number
ValueCountFrequency (%)
2
100.0%
Modifier Letter
ValueCountFrequency (%)
ʼ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 10925122
80.3%
Common 2678586
 
19.7%
Cyrillic 4587
 
< 0.1%
Greek 648
 
< 0.1%
Devanagari 67
 
< 0.1%
Telugu 30
 
< 0.1%
Tamil 19
 
< 0.1%
Hiragana 15
 
< 0.1%
Hangul 9
 
< 0.1%
Thai 8
 
< 0.1%
Other values (3) 15
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 1293273
11.8%
a 891306
 
8.2%
t 884105
 
8.1%
i 805386
 
7.4%
o 784660
 
7.2%
n 778611
 
7.1%
s 728670
 
6.7%
r 705631
 
6.5%
h 571750
 
5.2%
l 453602
 
4.2%
Other values (131) 3028128
27.7%
Common
ValueCountFrequency (%)
2283408
85.2%
, 126545
 
4.7%
. 118748
 
4.4%
- 33347
 
1.2%
' 29742
 
1.1%
" 10905
 
0.4%
) 9695
 
0.4%
( 9676
 
0.4%
1 8981
 
0.3%
0 7403
 
0.3%
Other values (70) 40136
 
1.5%
Cyrillic
ValueCountFrequency (%)
о 470
 
10.2%
е 404
 
8.8%
а 373
 
8.1%
н 323
 
7.0%
и 299
 
6.5%
т 265
 
5.8%
р 240
 
5.2%
с 218
 
4.8%
в 173
 
3.8%
л 161
 
3.5%
Other values (46) 1661
36.2%
Greek
ValueCountFrequency (%)
α 60
 
9.3%
ο 55
 
8.5%
τ 43
 
6.6%
ι 36
 
5.6%
η 36
 
5.6%
ν 34
 
5.2%
ρ 31
 
4.8%
ε 31
 
4.8%
π 30
 
4.6%
ς 30
 
4.6%
Other values (33) 262
40.4%
Devanagari
ValueCountFrequency (%)
9
 
13.4%
5
 
7.5%
5
 
7.5%
5
 
7.5%
3
 
4.5%
3
 
4.5%
3
 
4.5%
3
 
4.5%
3
 
4.5%
2
 
3.0%
Other values (20) 26
38.8%
Telugu
ValueCountFrequency (%)
ి 4
13.3%
3
10.0%
3
10.0%
3
10.0%
2
 
6.7%
2
 
6.7%
2
 
6.7%
2
 
6.7%
2
 
6.7%
1
 
3.3%
Other values (6) 6
20.0%
Hiragana
ValueCountFrequency (%)
3
20.0%
1
 
6.7%
1
 
6.7%
1
 
6.7%
1
 
6.7%
1
 
6.7%
1
 
6.7%
1
 
6.7%
1
 
6.7%
1
 
6.7%
Other values (3) 3
20.0%
Tamil
ValueCountFrequency (%)
3
15.8%
2
10.5%
2
10.5%
2
10.5%
2
10.5%
1
 
5.3%
1
 
5.3%
1
 
5.3%
1
 
5.3%
1
 
5.3%
Other values (3) 3
15.8%
Hangul
ValueCountFrequency (%)
2
22.2%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
Thai
ValueCountFrequency (%)
2
25.0%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
Han
ValueCountFrequency (%)
1
25.0%
1
25.0%
1
25.0%
1
25.0%
Arabic
ValueCountFrequency (%)
م 2
50.0%
ہ 1
25.0%
ت 1
25.0%
Inherited
ValueCountFrequency (%)
́ 4
57.1%
̈ 3
42.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 13592044
99.9%
Punctuation 6631
 
< 0.1%
None 5658
 
< 0.1%
Cyrillic 4587
 
< 0.1%
Devanagari 67
 
< 0.1%
Telugu 30
 
< 0.1%
Tamil 19
 
< 0.1%
Hiragana 15
 
< 0.1%
Letterlike Symbols 12
 
< 0.1%
Hangul 9
 
< 0.1%
Other values (11) 34
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2283408
16.8%
e 1293273
 
9.5%
a 891306
 
6.6%
t 884105
 
6.5%
i 805386
 
5.9%
o 784660
 
5.8%
n 778611
 
5.7%
s 728670
 
5.4%
r 705631
 
5.2%
h 571750
 
4.2%
Other values (82) 3865244
28.4%
Punctuation
ValueCountFrequency (%)
3549
53.5%
823
 
12.4%
598
 
9.0%
577
 
8.7%
570
 
8.6%
287
 
4.3%
177
 
2.7%
29
 
0.4%
7
 
0.1%
5
 
0.1%
Other values (4) 9
 
0.1%
None
ValueCountFrequency (%)
é 1503
26.6%
á 272
 
4.8%
ä 260
 
4.6%
í 233
 
4.1%
ö 222
 
3.9%
è 189
 
3.3%
ü 168
 
3.0%
ı 161
 
2.8%
ó 158
 
2.8%
ç 153
 
2.7%
Other values (139) 2339
41.3%
Cyrillic
ValueCountFrequency (%)
о 470
 
10.2%
е 404
 
8.8%
а 373
 
8.1%
н 323
 
7.0%
и 299
 
6.5%
т 265
 
5.8%
р 240
 
5.2%
с 218
 
4.8%
в 173
 
3.8%
л 161
 
3.5%
Other values (46) 1661
36.2%
Letterlike Symbols
ValueCountFrequency (%)
12
100.0%
Devanagari
ValueCountFrequency (%)
9
 
13.4%
5
 
7.5%
5
 
7.5%
5
 
7.5%
3
 
4.5%
3
 
4.5%
3
 
4.5%
3
 
4.5%
3
 
4.5%
2
 
3.0%
Other values (20) 26
38.8%
Telugu
ValueCountFrequency (%)
ి 4
13.3%
3
10.0%
3
10.0%
3
10.0%
2
 
6.7%
2
 
6.7%
2
 
6.7%
2
 
6.7%
2
 
6.7%
1
 
3.3%
Other values (6) 6
20.0%
Diacriticals
ValueCountFrequency (%)
́ 4
57.1%
̈ 3
42.9%
Hiragana
ValueCountFrequency (%)
3
20.0%
1
 
6.7%
1
 
6.7%
1
 
6.7%
1
 
6.7%
1
 
6.7%
1
 
6.7%
1
 
6.7%
1
 
6.7%
1
 
6.7%
Other values (3) 3
20.0%
Tamil
ValueCountFrequency (%)
3
15.8%
2
10.5%
2
10.5%
2
10.5%
2
10.5%
1
 
5.3%
1
 
5.3%
1
 
5.3%
1
 
5.3%
1
 
5.3%
Other values (3) 3
15.8%
Hangul
ValueCountFrequency (%)
2
22.2%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
Thai
ValueCountFrequency (%)
2
25.0%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
Alphabetic PF
ValueCountFrequency (%)
2
100.0%
Arabic
ValueCountFrequency (%)
م 2
50.0%
ہ 1
25.0%
ت 1
25.0%
Number Forms
ValueCountFrequency (%)
2
100.0%
Modifier Letters
ValueCountFrequency (%)
ʼ 2
100.0%
Katakana
ValueCountFrequency (%)
1
100.0%
Math Operators
ValueCountFrequency (%)
1
100.0%
CJK
ValueCountFrequency (%)
1
25.0%
1
25.0%
1
25.0%
1
25.0%
Currency Symbols
ValueCountFrequency (%)
1
50.0%
1
50.0%
Specials
ValueCountFrequency (%)
1
100.0%

popularity
Real number (ℝ)

HIGH CORRELATION  SKEWED 

Distinct41873
Distinct (%)97.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.0686855
Minimum0
Maximum547.4883
Zeros26
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size336.8 KiB
2023-06-08T14:00:22.662831image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.038963
Q10.45402
median1.233068
Q34.01643
95-th percentile11.224229
Maximum547.4883
Range547.4883
Interquartile range (IQR)3.56241

Descriptive statistics

Standard deviation6.1339484
Coefficient of variation (CV)1.9988847
Kurtosis1863.4343
Mean3.0686855
Median Absolute Deviation (MAD)1.023568
Skewness28.862918
Sum132251.14
Variance37.625322
MonotonicityNot monotonic
2023-06-08T14:00:23.116068image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 × 10-647
 
0.1%
0.000578 37
 
0.1%
0.000308 36
 
0.1%
0.000844 35
 
0.1%
0.00022 34
 
0.1%
0 26
 
0.1%
0.002001 25
 
0.1%
0.003013 18
 
< 0.1%
0.000603 15
 
< 0.1%
0.001393 14
 
< 0.1%
Other values (41863) 42810
99.3%
ValueCountFrequency (%)
0 26
0.1%
1 × 10-647
0.1%
2 × 10-65
 
< 0.1%
3 × 10-63
 
< 0.1%
4 × 10-64
 
< 0.1%
5 × 10-61
 
< 0.1%
6 × 10-62
 
< 0.1%
8 × 10-64
 
< 0.1%
9 × 10-62
 
< 0.1%
1.1 × 10-55
 
< 0.1%
ValueCountFrequency (%)
547.488298 1
< 0.1%
294.337037 1
< 0.1%
287.253654 1
< 0.1%
228.032744 1
< 0.1%
213.849907 1
< 0.1%
187.860492 1
< 0.1%
185.330992 1
< 0.1%
185.070892 1
< 0.1%
183.870374 1
< 0.1%
154.801009 1
< 0.1%

production_companies
Categorical

HIGH CARDINALITY  MISSING 

Distinct22367
Distinct (%)67.5%
Missing9985
Missing (%)23.2%
Memory size336.8 KiB
Metro-Goldwyn-Mayer (MGM)
 
739
Warner Bros.
 
539
Paramount Pictures
 
507
Twentieth Century Fox Film Corporation
 
439
Universal Pictures
 
320
Other values (22362)
30568 

Length

Max length609
Median length412
Mean length41.744624
Min length2

Characters and Unicode

Total characters1382248
Distinct characters294
Distinct categories17 ?
Distinct scripts6 ?
Distinct blocks6 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique20057 ?
Unique (%)60.6%

Sample

1st rowPixar Animation Studios
2nd rowTriStar Pictures, Teitler Film, Interscope Communications
3rd rowWarner Bros., Lancaster Gate
4th rowTwentieth Century Fox Film Corporation
5th rowSandollar Productions, Touchstone Pictures

Common Values

ValueCountFrequency (%)
Metro-Goldwyn-Mayer (MGM) 739
 
1.7%
Warner Bros. 539
 
1.3%
Paramount Pictures 507
 
1.2%
Twentieth Century Fox Film Corporation 439
 
1.0%
Universal Pictures 320
 
0.7%
RKO Radio Pictures 247
 
0.6%
Columbia Pictures Corporation 207
 
0.5%
Columbia Pictures 146
 
0.3%
Mosfilm 145
 
0.3%
Walt Disney Pictures 82
 
0.2%
Other values (22357) 29741
69.0%
(Missing) 9985
 
23.2%

Length

2023-06-08T14:00:23.607929image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
films 9374
 
5.3%
pictures 9246
 
5.2%
productions 9001
 
5.1%
film 6616
 
3.8%
entertainment 5140
 
2.9%
corporation 2184
 
1.2%
company 1737
 
1.0%
warner 1476
 
0.8%
bros 1409
 
0.8%
the 1378
 
0.8%
Other values (18400) 128568
73.0%

Most occurring characters

ValueCountFrequency (%)
143024
 
10.3%
i 106025
 
7.7%
e 93945
 
6.8%
n 89299
 
6.5%
o 84561
 
6.1%
r 82993
 
6.0%
t 82843
 
6.0%
a 76521
 
5.5%
s 62152
 
4.5%
l 50823
 
3.7%
Other values (284) 510062
36.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 979000
70.8%
Uppercase Letter 197206
 
14.3%
Space Separator 143029
 
10.3%
Other Punctuation 44948
 
3.3%
Decimal Number 4344
 
0.3%
Dash Punctuation 4300
 
0.3%
Open Punctuation 4288
 
0.3%
Close Punctuation 4287
 
0.3%
Math Symbol 664
 
< 0.1%
Other Letter 140
 
< 0.1%
Other values (7) 42
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 106025
10.8%
e 93945
9.6%
n 89299
9.1%
o 84561
8.6%
r 82993
8.5%
t 82843
8.5%
a 76521
 
7.8%
s 62152
 
6.3%
l 50823
 
5.2%
m 43874
 
4.5%
Other values (102) 205964
21.0%
Other Letter
ValueCountFrequency (%)
9
 
6.4%
8
 
5.7%
6
 
4.3%
5
 
3.6%
5
 
3.6%
5
 
3.6%
5
 
3.6%
5
 
3.6%
4
 
2.9%
3
 
2.1%
Other values (62) 85
60.7%
Uppercase Letter
ValueCountFrequency (%)
P 27686
14.0%
F 26112
13.2%
C 20469
 
10.4%
M 13275
 
6.7%
S 11754
 
6.0%
E 9676
 
4.9%
A 9449
 
4.8%
T 9291
 
4.7%
B 8885
 
4.5%
G 7753
 
3.9%
Other values (52) 52856
26.8%
Other Punctuation
ValueCountFrequency (%)
, 37243
82.9%
. 5654
 
12.6%
& 749
 
1.7%
/ 642
 
1.4%
' 448
 
1.0%
" 131
 
0.3%
! 36
 
0.1%
% 18
 
< 0.1%
: 9
 
< 0.1%
@ 5
 
< 0.1%
Other values (6) 13
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
2 1039
23.9%
1 714
16.4%
0 641
14.8%
3 552
12.7%
4 474
10.9%
9 203
 
4.7%
6 197
 
4.5%
5 177
 
4.1%
7 174
 
4.0%
8 173
 
4.0%
Open Punctuation
ValueCountFrequency (%)
( 4278
99.8%
[ 9
 
0.2%
1
 
< 0.1%
Close Punctuation
ValueCountFrequency (%)
) 4277
99.8%
] 9
 
0.2%
1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
143024
> 99.9%
  5
 
< 0.1%
Dash Punctuation
ValueCountFrequency (%)
- 4299
> 99.9%
1
 
< 0.1%
Math Symbol
ValueCountFrequency (%)
+ 663
99.8%
| 1
 
0.2%
Other Symbol
ValueCountFrequency (%)
° 20
90.9%
2
 
9.1%
Final Punctuation
ValueCountFrequency (%)
» 3
50.0%
3
50.0%
Other Number
ValueCountFrequency (%)
² 1
50.0%
½ 1
50.0%
Control
ValueCountFrequency (%)
4
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 4
100.0%
Initial Punctuation
ValueCountFrequency (%)
« 3
100.0%
Format
ValueCountFrequency (%)
1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1175817
85.1%
Common 205900
 
14.9%
Cyrillic 359
 
< 0.1%
Hangul 115
 
< 0.1%
Greek 31
 
< 0.1%
Han 26
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 106025
 
9.0%
e 93945
 
8.0%
n 89299
 
7.6%
o 84561
 
7.2%
r 82993
 
7.1%
t 82843
 
7.0%
a 76521
 
6.5%
s 62152
 
5.3%
l 50823
 
4.3%
m 43874
 
3.7%
Other values (99) 402781
34.3%
Hangul
ValueCountFrequency (%)
9
 
7.8%
8
 
7.0%
6
 
5.2%
5
 
4.3%
5
 
4.3%
5
 
4.3%
5
 
4.3%
5
 
4.3%
4
 
3.5%
3
 
2.6%
Other values (43) 60
52.2%
Common
ValueCountFrequency (%)
143024
69.5%
, 37243
 
18.1%
. 5654
 
2.7%
- 4299
 
2.1%
( 4278
 
2.1%
) 4277
 
2.1%
2 1039
 
0.5%
& 749
 
0.4%
1 714
 
0.3%
+ 663
 
0.3%
Other values (37) 3960
 
1.9%
Cyrillic
ValueCountFrequency (%)
и 33
 
9.2%
о 27
 
7.5%
а 26
 
7.2%
л 20
 
5.6%
н 20
 
5.6%
м 17
 
4.7%
с 16
 
4.5%
е 16
 
4.5%
т 16
 
4.5%
ь 14
 
3.9%
Other values (36) 154
42.9%
Greek
ValueCountFrequency (%)
ν 3
 
9.7%
ο 3
 
9.7%
τ 2
 
6.5%
Κ 2
 
6.5%
ι 2
 
6.5%
η 2
 
6.5%
λ 2
 
6.5%
Ε 2
 
6.5%
ρ 2
 
6.5%
μ 1
 
3.2%
Other values (10) 10
32.3%
Han
ValueCountFrequency (%)
2
 
7.7%
2
 
7.7%
2
 
7.7%
2
 
7.7%
2
 
7.7%
2
 
7.7%
2
 
7.7%
1
 
3.8%
1
 
3.8%
1
 
3.8%
Other values (9) 9
34.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1376115
99.6%
None 5629
 
0.4%
Cyrillic 359
 
< 0.1%
Hangul 113
 
< 0.1%
CJK 26
 
< 0.1%
Punctuation 6
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
143024
 
10.4%
i 106025
 
7.7%
e 93945
 
6.8%
n 89299
 
6.5%
o 84561
 
6.1%
r 82993
 
6.0%
t 82843
 
6.0%
a 76521
 
5.6%
s 62152
 
4.5%
l 50823
 
3.7%
Other values (77) 503929
36.6%
None
ValueCountFrequency (%)
é 3151
56.0%
ó 411
 
7.3%
á 307
 
5.5%
í 169
 
3.0%
ü 150
 
2.7%
ñ 149
 
2.6%
ô 140
 
2.5%
ä 137
 
2.4%
ö 134
 
2.4%
ç 125
 
2.2%
Other values (76) 756
 
13.4%
Cyrillic
ValueCountFrequency (%)
и 33
 
9.2%
о 27
 
7.5%
а 26
 
7.2%
л 20
 
5.6%
н 20
 
5.6%
м 17
 
4.7%
с 16
 
4.5%
е 16
 
4.5%
т 16
 
4.5%
ь 14
 
3.9%
Other values (36) 154
42.9%
Hangul
ValueCountFrequency (%)
9
 
8.0%
8
 
7.1%
6
 
5.3%
5
 
4.4%
5
 
4.4%
5
 
4.4%
5
 
4.4%
5
 
4.4%
4
 
3.5%
3
 
2.7%
Other values (42) 58
51.3%
Punctuation
ValueCountFrequency (%)
3
50.0%
1
 
16.7%
1
 
16.7%
1
 
16.7%
CJK
ValueCountFrequency (%)
2
 
7.7%
2
 
7.7%
2
 
7.7%
2
 
7.7%
2
 
7.7%
2
 
7.7%
2
 
7.7%
1
 
3.8%
1
 
3.8%
1
 
3.8%
Other values (9) 9
34.6%

production_countries
Categorical

HIGH CARDINALITY  IMBALANCE  MISSING 

Distinct2311
Distinct (%)6.1%
Missing4914
Missing (%)11.4%
Memory size336.8 KiB
United States of America
17492 
United Kingdom
2166 
France
 
1586
Japan
 
1313
Italy
 
1025
Other values (2306)
14601 

Length

Max length237
Median length167
Mean length19.10468
Min length4

Characters and Unicode

Total characters729474
Distinct characters53
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1700 ?
Unique (%)4.5%

Sample

1st rowUnited States of America
2nd rowUnited States of America
3rd rowUnited States of America
4th rowUnited States of America
5th rowUnited States of America

Common Values

ValueCountFrequency (%)
United States of America 17492
40.6%
United Kingdom 2166
 
5.0%
France 1586
 
3.7%
Japan 1313
 
3.0%
Italy 1025
 
2.4%
Canada 795
 
1.8%
India 729
 
1.7%
Germany 725
 
1.7%
Russia 705
 
1.6%
United Kingdom, United States of America 567
 
1.3%
Other values (2301) 11080
25.7%
(Missing) 4914
 
11.4%

Length

2023-06-08T14:00:24.523167image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
united 24755
21.3%
states 20731
17.8%
of 20730
17.8%
america 20730
17.8%
kingdom 3997
 
3.4%
france 3842
 
3.3%
germany 2208
 
1.9%
italy 2162
 
1.9%
canada 1701
 
1.5%
japan 1590
 
1.4%
Other values (176) 13731
11.8%

Most occurring characters

ValueCountFrequency (%)
e 78920
 
10.8%
77994
 
10.7%
t 71135
 
9.8%
a 68801
 
9.4%
i 57233
 
7.8%
n 46333
 
6.4%
d 33771
 
4.6%
r 31731
 
4.3%
o 28977
 
4.0%
m 28109
 
3.9%
Other values (43) 206470
28.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 546038
74.9%
Uppercase Letter 95409
 
13.1%
Space Separator 77994
 
10.7%
Other Punctuation 10033
 
1.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 78920
14.5%
t 71135
13.0%
a 68801
12.6%
i 57233
10.5%
n 46333
8.5%
d 33771
6.2%
r 31731
5.8%
o 28977
 
5.3%
m 28109
 
5.1%
c 25810
 
4.7%
Other values (16) 75218
13.8%
Uppercase Letter
ValueCountFrequency (%)
U 24848
26.0%
S 23356
24.5%
A 21914
23.0%
K 5118
 
5.4%
F 4214
 
4.4%
I 3552
 
3.7%
C 2491
 
2.6%
G 2413
 
2.5%
J 1606
 
1.7%
R 1249
 
1.3%
Other values (14) 4648
 
4.9%
Other Punctuation
ValueCountFrequency (%)
, 10028
> 99.9%
' 5
 
< 0.1%
Space Separator
ValueCountFrequency (%)
77994
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 641447
87.9%
Common 88027
 
12.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 78920
12.3%
t 71135
11.1%
a 68801
10.7%
i 57233
 
8.9%
n 46333
 
7.2%
d 33771
 
5.3%
r 31731
 
4.9%
o 28977
 
4.5%
m 28109
 
4.4%
c 25810
 
4.0%
Other values (40) 170627
26.6%
Common
ValueCountFrequency (%)
77994
88.6%
, 10028
 
11.4%
' 5
 
< 0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 729474
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 78920
 
10.8%
77994
 
10.7%
t 71135
 
9.8%
a 68801
 
9.4%
i 57233
 
7.8%
n 46333
 
6.4%
d 33771
 
4.6%
r 31731
 
4.3%
o 28977
 
4.0%
m 28109
 
3.9%
Other values (43) 206470
28.3%
Distinct17012
Distinct (%)39.5%
Missing0
Missing (%)0.0%
Memory size336.8 KiB
Minimum1878-06-14 00:00:00
Maximum2020-12-16 00:00:00
2023-06-08T14:00:25.056932image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:25.552935image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

revenue
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct6833
Distinct (%)15.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11829212
Minimum0
Maximum2.7879651 × 109
Zeros35703
Zeros (%)82.8%
Negative0
Negative (%)0.0%
Memory size336.8 KiB
2023-06-08T14:00:25.958413image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile52234142
Maximum2.7879651 × 109
Range2.7879651 × 109
Interquartile range (IQR)0

Descriptive statistics

Standard deviation66018497
Coefficient of variation (CV)5.5809715
Kurtosis225.30685
Mean11829212
Median Absolute Deviation (MAD)0
Skewness11.944373
Sum5.0980357 × 1011
Variance4.358442 × 1015
MonotonicityNot monotonic
2023-06-08T14:00:26.364939image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 35703
82.8%
12000000 20
 
< 0.1%
10000000 19
 
< 0.1%
11000000 19
 
< 0.1%
2000000 18
 
< 0.1%
6000000 17
 
< 0.1%
5000000 14
 
< 0.1%
8000000 13
 
< 0.1%
500000 13
 
< 0.1%
1 12
 
< 0.1%
Other values (6823) 7249
 
16.8%
ValueCountFrequency (%)
0 35703
82.8%
1 12
 
< 0.1%
2 3
 
< 0.1%
3 8
 
< 0.1%
4 4
 
< 0.1%
5 5
 
< 0.1%
6 2
 
< 0.1%
7 4
 
< 0.1%
8 5
 
< 0.1%
9 1
 
< 0.1%
ValueCountFrequency (%)
2787965087 1
< 0.1%
2068223624 1
< 0.1%
1845034188 1
< 0.1%
1519557910 1
< 0.1%
1513528810 1
< 0.1%
1506249360 1
< 0.1%
1405403694 1
< 0.1%
1342000000 1
< 0.1%
1274219009 1
< 0.1%
1262886337 1
< 0.1%

runtime
Real number (ℝ)

Distinct350
Distinct (%)0.8%
Missing219
Missing (%)0.5%
Infinite0
Infinite (%)0.0%
Mean95.819511
Minimum0
Maximum1256
Zeros1276
Zeros (%)3.0%
Negative0
Negative (%)0.0%
Memory size336.8 KiB
2023-06-08T14:00:26.748735image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile29
Q186
median95
Q3107
95-th percentile138
Maximum1256
Range1256
Interquartile range (IQR)21

Descriptive statistics

Standard deviation36.737938
Coefficient of variation (CV)0.38340769
Kurtosis99.586419
Mean95.819511
Median Absolute Deviation (MAD)10
Skewness4.7735464
Sum4108549
Variance1349.6761
MonotonicityNot monotonic
2023-06-08T14:00:27.038938image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
90 2424
 
5.6%
100 1435
 
3.3%
95 1389
 
3.2%
0 1276
 
3.0%
93 1184
 
2.7%
96 1075
 
2.5%
92 1052
 
2.4%
91 1040
 
2.4%
94 1034
 
2.4%
97 1007
 
2.3%
Other values (340) 29962
69.5%
ValueCountFrequency (%)
0 1276
3.0%
1 51
 
0.1%
2 10
 
< 0.1%
3 24
 
0.1%
4 23
 
0.1%
5 31
 
0.1%
6 44
 
0.1%
7 71
 
0.2%
8 50
 
0.1%
9 47
 
0.1%
ValueCountFrequency (%)
1256 1
< 0.1%
1140 1
< 0.1%
931 1
< 0.1%
925 1
< 0.1%
900 1
< 0.1%
877 1
< 0.1%
874 1
< 0.1%
840 2
< 0.1%
780 1
< 0.1%
720 1
< 0.1%

spoken_languages
Categorical

HIGH CARDINALITY  IMBALANCE  MISSING 

Distinct1793
Distinct (%)4.5%
Missing2848
Missing (%)6.6%
Memory size336.8 KiB
English
21820 
Français
 
1810
日本語
 
1240
Italiano
 
1207
Español
 
855
Other values (1788)
13317 

Length

Max length171
Median length7
Mean length9.4028672
Min length2

Characters and Unicode

Total characters378456
Distinct characters171
Distinct categories8 ?
Distinct scripts15 ?
Distinct blocks16 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1258 ?
Unique (%)3.1%

Sample

1st rowEnglish
2nd rowEnglish, Français
3rd rowEnglish
4th rowEnglish
5th rowEnglish

Common Values

ValueCountFrequency (%)
English 21820
50.6%
Français 1810
 
4.2%
日本語 1240
 
2.9%
Italiano 1207
 
2.8%
Español 855
 
2.0%
Pусский 767
 
1.8%
Deutsch 731
 
1.7%
English, Français 669
 
1.6%
English, Español 563
 
1.3%
हिन्दी 479
 
1.1%
Other values (1783) 10108
23.5%
(Missing) 2848
 
6.6%

Length

2023-06-08T14:00:27.362695image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
english 28017
53.2%
français 4104
 
7.8%
deutsch 2563
 
4.9%
italiano 2346
 
4.5%
español 2322
 
4.4%
日本語 1687
 
3.2%
pусский 1497
 
2.8%
普通话 770
 
1.5%
हिन्दी 695
 
1.3%
644
 
1.2%
Other values (69) 8061
 
15.3%

Most occurring characters

ValueCountFrequency (%)
s 41168
10.9%
n 36458
 
9.6%
i 36204
 
9.6%
l 33764
 
8.9%
h 30678
 
8.1%
E 30393
 
8.0%
g 29418
 
7.8%
a 18289
 
4.8%
12645
 
3.3%
, 11374
 
3.0%
Other values (161) 98065
25.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 283509
74.9%
Uppercase Letter 45014
 
11.9%
Other Letter 21532
 
5.7%
Space Separator 12645
 
3.3%
Other Punctuation 12424
 
3.3%
Spacing Mark 1804
 
0.5%
Nonspacing Mark 1505
 
0.4%
Control 23
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s 41168
14.5%
n 36458
12.9%
i 36204
12.8%
l 33764
11.9%
h 30678
10.8%
g 29418
10.4%
a 18289
6.5%
o 6740
 
2.4%
r 5978
 
2.1%
t 5860
 
2.1%
Other values (63) 38952
13.7%
Other Letter
ValueCountFrequency (%)
1687
 
7.8%
1687
 
7.8%
1687
 
7.8%
1237
 
5.7%
934
 
4.3%
770
 
3.6%
770
 
3.6%
695
 
3.2%
695
 
3.2%
695
 
3.2%
Other values (46) 10675
49.6%
Uppercase Letter
ValueCountFrequency (%)
E 30393
67.5%
F 4105
 
9.1%
D 2861
 
6.4%
P 2573
 
5.7%
I 2346
 
5.2%
N 690
 
1.5%
L 373
 
0.8%
M 349
 
0.8%
T 302
 
0.7%
Č 274
 
0.6%
Other values (13) 748
 
1.7%
Spacing Mark
ValueCountFrequency (%)
695
38.5%
ि 695
38.5%
134
 
7.4%
ி 111
 
6.2%
90
 
5.0%
45
 
2.5%
17
 
0.9%
17
 
0.9%
Nonspacing Mark
ValueCountFrequency (%)
695
46.2%
ִ 410
27.2%
ְ 205
 
13.6%
111
 
7.4%
67
 
4.5%
17
 
1.1%
Other Punctuation
ValueCountFrequency (%)
, 11374
91.5%
/ 1000
 
8.0%
? 50
 
0.4%
Space Separator
ValueCountFrequency (%)
12645
100.0%
Control
ValueCountFrequency (%)
š 23
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 316653
83.7%
Common 25092
 
6.6%
Han 10173
 
2.7%
Cyrillic 9983
 
2.6%
Devanagari 4170
 
1.1%
Hangul 3198
 
0.8%
Arabic 3137
 
0.8%
Greek 1656
 
0.4%
Hebrew 1640
 
0.4%
Thai 1239
 
0.3%
Other values (5) 1515
 
0.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
s 41168
13.0%
n 36458
11.5%
i 36204
11.4%
l 33764
10.7%
h 30678
9.7%
E 30393
9.6%
g 29418
9.3%
a 18289
 
5.8%
o 6740
 
2.1%
r 5978
 
1.9%
Other values (50) 47563
15.0%
Cyrillic
ValueCountFrequency (%)
с 3074
30.8%
к 1655
16.6%
и 1605
16.1%
й 1545
15.5%
у 1499
15.0%
а 106
 
1.1%
р 80
 
0.8%
У 48
 
0.5%
ь 48
 
0.5%
ї 48
 
0.5%
Other values (12) 275
 
2.8%
Arabic
ValueCountFrequency (%)
ا 504
16.1%
ر 504
16.1%
ة 322
10.3%
ي 322
10.3%
ع 322
10.3%
ل 322
10.3%
ب 322
10.3%
ی 131
 
4.2%
س 131
 
4.2%
ف 131
 
4.2%
Other values (5) 126
 
4.0%
Han
ValueCountFrequency (%)
1687
16.6%
1687
16.6%
1687
16.6%
1237
12.2%
934
9.2%
770
7.6%
770
7.6%
467
 
4.6%
广 467
 
4.6%
467
 
4.6%
Greek
ValueCountFrequency (%)
λ 414
25.0%
ά 207
12.5%
κ 207
12.5%
ι 207
12.5%
ν 207
12.5%
η 207
12.5%
ε 207
12.5%
Hebrew
ValueCountFrequency (%)
ִ 410
25.0%
ת 205
12.5%
י 205
12.5%
ר 205
12.5%
ְ 205
12.5%
ב 205
12.5%
ע 205
12.5%
Georgian
ValueCountFrequency (%)
33
14.3%
33
14.3%
33
14.3%
33
14.3%
33
14.3%
33
14.3%
33
14.3%
Devanagari
ValueCountFrequency (%)
695
16.7%
695
16.7%
695
16.7%
695
16.7%
695
16.7%
ि 695
16.7%
Hangul
ValueCountFrequency (%)
533
16.7%
533
16.7%
533
16.7%
533
16.7%
533
16.7%
533
16.7%
Thai
ValueCountFrequency (%)
354
28.6%
177
14.3%
177
14.3%
177
14.3%
177
14.3%
177
14.3%
Gurmukhi
ValueCountFrequency (%)
17
16.7%
17
16.7%
17
16.7%
17
16.7%
17
16.7%
17
16.7%
Common
ValueCountFrequency (%)
12645
50.4%
, 11374
45.3%
/ 1000
 
4.0%
? 50
 
0.2%
š 23
 
0.1%
Telugu
ValueCountFrequency (%)
134
33.3%
67
16.7%
67
16.7%
67
16.7%
67
16.7%
Tamil
ValueCountFrequency (%)
111
20.0%
111
20.0%
ி 111
20.0%
111
20.0%
111
20.0%
Bengali
ValueCountFrequency (%)
90
40.0%
45
20.0%
45
20.0%
45
20.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 333150
88.0%
CJK 10173
 
2.7%
None 10127
 
2.7%
Cyrillic 9983
 
2.6%
Devanagari 4170
 
1.1%
Hangul 3198
 
0.8%
Arabic 3137
 
0.8%
Hebrew 1640
 
0.4%
Thai 1239
 
0.3%
Tamil 555
 
0.1%
Other values (6) 1084
 
0.3%

Most frequent character per block

ASCII
ValueCountFrequency (%)
s 41168
12.4%
n 36458
10.9%
i 36204
10.9%
l 33764
10.1%
h 30678
9.2%
E 30393
9.1%
g 29418
8.8%
a 18289
 
5.5%
12645
 
3.8%
, 11374
 
3.4%
Other values (38) 52759
15.8%
None
ValueCountFrequency (%)
ç 4346
42.9%
ñ 2322
22.9%
ê 569
 
5.6%
λ 414
 
4.1%
ý 274
 
2.7%
Č 274
 
2.7%
ü 242
 
2.4%
ά 207
 
2.0%
κ 207
 
2.0%
ι 207
 
2.0%
Other values (11) 1065
 
10.5%
Cyrillic
ValueCountFrequency (%)
с 3074
30.8%
к 1655
16.6%
и 1605
16.1%
й 1545
15.5%
у 1499
15.0%
а 106
 
1.1%
р 80
 
0.8%
У 48
 
0.5%
ь 48
 
0.5%
ї 48
 
0.5%
Other values (12) 275
 
2.8%
CJK
ValueCountFrequency (%)
1687
16.6%
1687
16.6%
1687
16.6%
1237
12.2%
934
9.2%
770
7.6%
770
7.6%
467
 
4.6%
广 467
 
4.6%
467
 
4.6%
Devanagari
ValueCountFrequency (%)
695
16.7%
695
16.7%
695
16.7%
695
16.7%
695
16.7%
ि 695
16.7%
Hangul
ValueCountFrequency (%)
533
16.7%
533
16.7%
533
16.7%
533
16.7%
533
16.7%
533
16.7%
Arabic
ValueCountFrequency (%)
ا 504
16.1%
ر 504
16.1%
ة 322
10.3%
ي 322
10.3%
ع 322
10.3%
ل 322
10.3%
ب 322
10.3%
ی 131
 
4.2%
س 131
 
4.2%
ف 131
 
4.2%
Other values (5) 126
 
4.0%
Hebrew
ValueCountFrequency (%)
ִ 410
25.0%
ת 205
12.5%
י 205
12.5%
ר 205
12.5%
ְ 205
12.5%
ב 205
12.5%
ע 205
12.5%
Thai
ValueCountFrequency (%)
354
28.6%
177
14.3%
177
14.3%
177
14.3%
177
14.3%
177
14.3%
Telugu
ValueCountFrequency (%)
134
33.3%
67
16.7%
67
16.7%
67
16.7%
67
16.7%
Tamil
ValueCountFrequency (%)
111
20.0%
111
20.0%
ி 111
20.0%
111
20.0%
111
20.0%
Bengali
ValueCountFrequency (%)
90
40.0%
45
20.0%
45
20.0%
45
20.0%
Latin Ext Additional
ValueCountFrequency (%)
ế 60
50.0%
60
50.0%
Georgian
ValueCountFrequency (%)
33
14.3%
33
14.3%
33
14.3%
33
14.3%
33
14.3%
33
14.3%
33
14.3%
Gurmukhi
ValueCountFrequency (%)
17
16.7%
17
16.7%
17
16.7%
17
16.7%
17
16.7%
17
16.7%
IPA Ext
ValueCountFrequency (%)
ə 4
100.0%

status
Categorical

Distinct6
Distinct (%)< 0.1%
Missing55
Missing (%)0.1%
Memory size336.8 KiB
Released
42706 
Rumored
 
210
Post Production
 
93
In Production
 
19
Planned
 
13

Length

Max length15
Median length8
Mean length8.0121509
Min length7

Characters and Unicode

Total characters344859
Distinct characters18
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowReleased
2nd rowReleased
3rd rowReleased
4th rowReleased
5th rowReleased

Common Values

ValueCountFrequency (%)
Released 42706
99.1%
Rumored 210
 
0.5%
Post Production 93
 
0.2%
In Production 19
 
< 0.1%
Planned 13
 
< 0.1%
Canceled 1
 
< 0.1%
(Missing) 55
 
0.1%

Length

2023-06-08T14:00:27.672374image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-06-08T14:00:27.975007image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
released 42706
99.0%
rumored 210
 
0.5%
production 112
 
0.3%
post 93
 
0.2%
in 19
 
< 0.1%
planned 13
 
< 0.1%
canceled 1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
e 128343
37.2%
d 43042
 
12.5%
R 42916
 
12.4%
s 42799
 
12.4%
l 42720
 
12.4%
a 42720
 
12.4%
o 527
 
0.2%
r 322
 
0.1%
u 322
 
0.1%
P 218
 
0.1%
Other values (8) 930
 
0.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 301593
87.5%
Uppercase Letter 43154
 
12.5%
Space Separator 112
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 128343
42.6%
d 43042
 
14.3%
s 42799
 
14.2%
l 42720
 
14.2%
a 42720
 
14.2%
o 527
 
0.2%
r 322
 
0.1%
u 322
 
0.1%
m 210
 
0.1%
t 205
 
0.1%
Other values (3) 383
 
0.1%
Uppercase Letter
ValueCountFrequency (%)
R 42916
99.4%
P 218
 
0.5%
I 19
 
< 0.1%
C 1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
112
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 344747
> 99.9%
Common 112
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 128343
37.2%
d 43042
 
12.5%
R 42916
 
12.4%
s 42799
 
12.4%
l 42720
 
12.4%
a 42720
 
12.4%
o 527
 
0.2%
r 322
 
0.1%
u 322
 
0.1%
P 218
 
0.1%
Other values (7) 818
 
0.2%
Common
ValueCountFrequency (%)
112
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 344859
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 128343
37.2%
d 43042
 
12.5%
R 42916
 
12.4%
s 42799
 
12.4%
l 42720
 
12.4%
a 42720
 
12.4%
o 527
 
0.2%
r 322
 
0.1%
u 322
 
0.1%
P 218
 
0.1%
Other values (8) 930
 
0.3%

tagline
Categorical

HIGH CARDINALITY  MISSING  UNIFORM 

Distinct19884
Distinct (%)99.3%
Missing23063
Missing (%)53.5%
Memory size336.8 KiB
Which one is the first to return - memory or the murderer?
 
9
Based on a true story.
 
7
Trust no one.
 
4
Be careful what you wish for.
 
4
A love, a hope, a wall.
 
4
Other values (19879)
20006 

Length

Max length297
Median length204
Mean length47.06484
Min length1

Characters and Unicode

Total characters942897
Distinct characters159
Distinct categories17 ?
Distinct scripts5 ?
Distinct blocks9 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique19779 ?
Unique (%)98.7%

Sample

1st rowRoll the dice and unleash the excitement!
2nd rowStill Yelling. Still Fighting. Still Ready for Love.
3rd rowFriends are the people who let you be yourself... and never let you forget it.
4th rowJust When His World Is Back To Normal... He's In For The Surprise Of His Life!
5th rowA Los Angeles Crime Saga

Common Values

ValueCountFrequency (%)
Which one is the first to return - memory or the murderer? 9
 
< 0.1%
Based on a true story. 7
 
< 0.1%
Trust no one. 4
 
< 0.1%
Be careful what you wish for. 4
 
< 0.1%
A love, a hope, a wall. 4
 
< 0.1%
The adventure of a lifetime, in a few mere seconds. 4
 
< 0.1%
There is no solitude greater than that of the Samurai 4
 
< 0.1%
Some things are better left top secret. 4
 
< 0.1%
One Nation. Underfed. 4
 
< 0.1%
Every woman who has loved will understand 4
 
< 0.1%
Other values (19874) 19986
46.4%
(Missing) 23063
53.5%

Length

2023-06-08T14:00:28.288764image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
the 10813
 
6.3%
a 6698
 
3.9%
of 4314
 
2.5%
to 3518
 
2.1%
is 2749
 
1.6%
in 2659
 
1.6%
and 2648
 
1.5%
you 2354
 
1.4%
1579
 
0.9%
for 1501
 
0.9%
Other values (14876) 132301
77.3%

Most occurring characters

ValueCountFrequency (%)
151248
16.0%
e 92965
 
9.9%
t 56289
 
6.0%
o 55559
 
5.9%
a 50539
 
5.4%
n 46696
 
5.0%
i 45145
 
4.8%
r 44231
 
4.7%
s 41647
 
4.4%
h 36578
 
3.9%
Other values (149) 322000
34.2%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 668803
70.9%
Space Separator 151248
 
16.0%
Uppercase Letter 73902
 
7.8%
Other Punctuation 44102
 
4.7%
Decimal Number 2610
 
0.3%
Dash Punctuation 1924
 
0.2%
Final Punctuation 98
 
< 0.1%
Open Punctuation 55
 
< 0.1%
Close Punctuation 54
 
< 0.1%
Currency Symbol 37
 
< 0.1%
Other values (7) 64
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 92965
13.9%
t 56289
 
8.4%
o 55559
 
8.3%
a 50539
 
7.6%
n 46696
 
7.0%
i 45145
 
6.8%
r 44231
 
6.6%
s 41647
 
6.2%
h 36578
 
5.5%
l 29624
 
4.4%
Other values (43) 169530
25.3%
Uppercase Letter
ValueCountFrequency (%)
T 9838
 
13.3%
A 6751
 
9.1%
S 5570
 
7.5%
H 4345
 
5.9%
I 4327
 
5.9%
E 4261
 
5.8%
W 3617
 
4.9%
O 3438
 
4.7%
N 3167
 
4.3%
L 3137
 
4.2%
Other values (19) 25451
34.4%
Other Letter
ValueCountFrequency (%)
1
 
4.2%
1
 
4.2%
1
 
4.2%
1
 
4.2%
1
 
4.2%
1
 
4.2%
1
 
4.2%
1
 
4.2%
1
 
4.2%
1
 
4.2%
Other values (14) 14
58.3%
Other Punctuation
ValueCountFrequency (%)
. 26381
59.8%
! 5757
 
13.1%
' 5614
 
12.7%
, 4147
 
9.4%
? 1133
 
2.6%
" 570
 
1.3%
148
 
0.3%
: 138
 
0.3%
& 80
 
0.2%
* 40
 
0.1%
Other values (7) 94
 
0.2%
Decimal Number
ValueCountFrequency (%)
0 783
30.0%
1 497
19.0%
2 292
 
11.2%
3 203
 
7.8%
9 199
 
7.6%
5 162
 
6.2%
4 138
 
5.3%
6 119
 
4.6%
7 116
 
4.4%
8 101
 
3.9%
Math Symbol
ValueCountFrequency (%)
= 5
35.7%
+ 5
35.7%
| 2
 
14.3%
1
 
7.1%
~ 1
 
7.1%
Dash Punctuation
ValueCountFrequency (%)
- 1909
99.2%
8
 
0.4%
7
 
0.4%
Final Punctuation
ValueCountFrequency (%)
82
83.7%
15
 
15.3%
» 1
 
1.0%
Initial Punctuation
ValueCountFrequency (%)
14
73.7%
4
 
21.1%
« 1
 
5.3%
Open Punctuation
ValueCountFrequency (%)
( 49
89.1%
[ 6
 
10.9%
Close Punctuation
ValueCountFrequency (%)
) 48
88.9%
] 6
 
11.1%
Other Number
ValueCountFrequency (%)
½ 2
66.7%
² 1
33.3%
Modifier Letter
ValueCountFrequency (%)
ˈ 1
50.0%
ˌ 1
50.0%
Space Separator
ValueCountFrequency (%)
151248
100.0%
Currency Symbol
ValueCountFrequency (%)
$ 37
100.0%
Nonspacing Mark
ValueCountFrequency (%)
1
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 742705
78.8%
Common 200167
 
21.2%
Han 16
 
< 0.1%
Tamil 5
 
< 0.1%
Katakana 4
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 92965
 
12.5%
t 56289
 
7.6%
o 55559
 
7.5%
a 50539
 
6.8%
n 46696
 
6.3%
i 45145
 
6.1%
r 44231
 
6.0%
s 41647
 
5.6%
h 36578
 
4.9%
l 29624
 
4.0%
Other values (72) 243432
32.8%
Common
ValueCountFrequency (%)
151248
75.6%
. 26381
 
13.2%
! 5757
 
2.9%
' 5614
 
2.8%
, 4147
 
2.1%
- 1909
 
1.0%
? 1133
 
0.6%
0 783
 
0.4%
" 570
 
0.3%
1 497
 
0.2%
Other values (42) 2128
 
1.1%
Han
ValueCountFrequency (%)
1
 
6.2%
1
 
6.2%
1
 
6.2%
1
 
6.2%
1
 
6.2%
1
 
6.2%
1
 
6.2%
1
 
6.2%
1
 
6.2%
1
 
6.2%
Other values (6) 6
37.5%
Tamil
ValueCountFrequency (%)
1
20.0%
1
20.0%
1
20.0%
1
20.0%
1
20.0%
Katakana
ValueCountFrequency (%)
1
25.0%
1
25.0%
1
25.0%
1
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 942482
> 99.9%
Punctuation 278
 
< 0.1%
None 107
 
< 0.1%
CJK 16
 
< 0.1%
Tamil 5
 
< 0.1%
Katakana 4
 
< 0.1%
IPA Ext 2
 
< 0.1%
Modifier Letters 2
 
< 0.1%
Math Operators 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
151248
16.0%
e 92965
 
9.9%
t 56289
 
6.0%
o 55559
 
5.9%
a 50539
 
5.4%
n 46696
 
5.0%
i 45145
 
4.8%
r 44231
 
4.7%
s 41647
 
4.4%
h 36578
 
3.9%
Other values (78) 321585
34.1%
Punctuation
ValueCountFrequency (%)
148
53.2%
82
29.5%
15
 
5.4%
14
 
5.0%
8
 
2.9%
7
 
2.5%
4
 
1.4%
None
ValueCountFrequency (%)
é 19
17.8%
ä 16
15.0%
ö 8
 
7.5%
á 6
 
5.6%
ó 5
 
4.7%
ü 5
 
4.7%
ı 5
 
4.7%
· 4
 
3.7%
ñ 3
 
2.8%
í 3
 
2.8%
Other values (25) 33
30.8%
IPA Ext
ValueCountFrequency (%)
ə 2
100.0%
CJK
ValueCountFrequency (%)
1
 
6.2%
1
 
6.2%
1
 
6.2%
1
 
6.2%
1
 
6.2%
1
 
6.2%
1
 
6.2%
1
 
6.2%
1
 
6.2%
1
 
6.2%
Other values (6) 6
37.5%
Katakana
ValueCountFrequency (%)
1
25.0%
1
25.0%
1
25.0%
1
25.0%
Tamil
ValueCountFrequency (%)
1
20.0%
1
20.0%
1
20.0%
1
20.0%
1
20.0%
Modifier Letters
ValueCountFrequency (%)
ˈ 1
50.0%
ˌ 1
50.0%
Math Operators
ValueCountFrequency (%)
1
100.0%

title
Categorical

HIGH CARDINALITY  UNIFORM 

Distinct39995
Distinct (%)92.8%
Missing0
Missing (%)0.0%
Memory size336.8 KiB
Blackout
 
13
Cinderella
 
11
Hamlet
 
9
Alice in Wonderland
 
9
King Lear
 
8
Other values (39990)
43047 

Length

Max length104
Median length78
Mean length16.560387
Min length1

Characters and Unicode

Total characters713703
Distinct characters270
Distinct categories17 ?
Distinct scripts6 ?
Distinct blocks10 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique37746 ?
Unique (%)87.6%

Sample

1st rowToy Story
2nd rowJumanji
3rd rowGrumpier Old Men
4th rowWaiting to Exhale
5th rowFather of the Bride Part II

Common Values

ValueCountFrequency (%)
Blackout 13
 
< 0.1%
Cinderella 11
 
< 0.1%
Hamlet 9
 
< 0.1%
Alice in Wonderland 9
 
< 0.1%
King Lear 8
 
< 0.1%
The Promise 8
 
< 0.1%
Les Misérables 8
 
< 0.1%
Beauty and the Beast 8
 
< 0.1%
Treasure Island 7
 
< 0.1%
A Christmas Carol 7
 
< 0.1%
Other values (39985) 43009
99.8%

Length

2023-06-08T14:00:28.613282image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
the 13807
 
10.8%
of 4621
 
3.6%
a 2107
 
1.6%
in 1596
 
1.2%
and 1532
 
1.2%
to 994
 
0.8%
699
 
0.5%
love 637
 
0.5%
man 630
 
0.5%
for 551
 
0.4%
Other values (23154) 101214
78.8%

Most occurring characters

ValueCountFrequency (%)
85309
 
12.0%
e 71975
 
10.1%
a 46149
 
6.5%
o 43004
 
6.0%
n 38378
 
5.4%
r 37757
 
5.3%
i 37369
 
5.2%
t 34555
 
4.8%
s 27860
 
3.9%
h 26970
 
3.8%
Other values (260) 264377
37.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 503151
70.5%
Uppercase Letter 110650
 
15.5%
Space Separator 85309
 
12.0%
Other Punctuation 9787
 
1.4%
Decimal Number 3617
 
0.5%
Dash Punctuation 918
 
0.1%
Close Punctuation 74
 
< 0.1%
Open Punctuation 72
 
< 0.1%
Final Punctuation 36
 
< 0.1%
Currency Symbol 20
 
< 0.1%
Other values (7) 69
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 71975
14.3%
a 46149
9.2%
o 43004
 
8.5%
n 38378
 
7.6%
r 37757
 
7.5%
i 37369
 
7.4%
t 34555
 
6.9%
s 27860
 
5.5%
h 26970
 
5.4%
l 24470
 
4.9%
Other values (117) 114664
22.8%
Uppercase Letter
ValueCountFrequency (%)
T 15202
13.7%
S 9686
 
8.8%
M 7598
 
6.9%
B 7239
 
6.5%
C 6804
 
6.1%
A 6397
 
5.8%
D 6033
 
5.5%
L 5528
 
5.0%
H 4873
 
4.4%
W 4840
 
4.4%
Other values (60) 36450
32.9%
Other Punctuation
ValueCountFrequency (%)
: 3414
34.9%
' 2353
24.0%
. 1539
15.7%
, 1061
 
10.8%
! 614
 
6.3%
& 424
 
4.3%
? 244
 
2.5%
/ 72
 
0.7%
* 18
 
0.2%
# 11
 
0.1%
Other values (8) 37
 
0.4%
Other Letter
ValueCountFrequency (%)
چ 2
 
10.0%
ک 2
 
10.0%
ی 2
 
10.0%
ه 2
 
10.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
س 1
 
5.0%
Other values (6) 6
30.0%
Decimal Number
ValueCountFrequency (%)
2 811
22.4%
1 646
17.9%
0 571
15.8%
3 466
12.9%
4 216
 
6.0%
9 215
 
5.9%
5 208
 
5.8%
7 181
 
5.0%
8 153
 
4.2%
6 150
 
4.1%
Other Number
ValueCountFrequency (%)
½ 9
56.2%
² 3
 
18.8%
³ 2
 
12.5%
1
 
6.2%
1
 
6.2%
Math Symbol
ValueCountFrequency (%)
+ 14
73.7%
× 3
 
15.8%
= 1
 
5.3%
1
 
5.3%
Other Symbol
ValueCountFrequency (%)
° 3
42.9%
2
28.6%
1
 
14.3%
1
 
14.3%
Currency Symbol
ValueCountFrequency (%)
$ 17
85.0%
¢ 2
 
10.0%
£ 1
 
5.0%
Dash Punctuation
ValueCountFrequency (%)
- 903
98.4%
15
 
1.6%
Close Punctuation
ValueCountFrequency (%)
) 69
93.2%
] 5
 
6.8%
Open Punctuation
ValueCountFrequency (%)
( 67
93.1%
[ 5
 
6.9%
Final Punctuation
ValueCountFrequency (%)
35
97.2%
1
 
2.8%
Initial Punctuation
ValueCountFrequency (%)
1
50.0%
1
50.0%
Space Separator
ValueCountFrequency (%)
85309
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 3
100.0%
Format
ValueCountFrequency (%)
2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 613447
86.0%
Common 99882
 
14.0%
Cyrillic 205
 
< 0.1%
Greek 150
 
< 0.1%
Arabic 11
 
< 0.1%
Katakana 8
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 71975
 
11.7%
a 46149
 
7.5%
o 43004
 
7.0%
n 38378
 
6.3%
r 37757
 
6.2%
i 37369
 
6.1%
t 34555
 
5.6%
s 27860
 
4.5%
h 26970
 
4.4%
l 24470
 
4.0%
Other values (106) 224960
36.7%
Common
ValueCountFrequency (%)
85309
85.4%
: 3414
 
3.4%
' 2353
 
2.4%
. 1539
 
1.5%
, 1061
 
1.1%
- 903
 
0.9%
2 811
 
0.8%
1 646
 
0.6%
! 614
 
0.6%
0 571
 
0.6%
Other values (47) 2661
 
2.7%
Cyrillic
ValueCountFrequency (%)
о 21
 
10.2%
а 17
 
8.3%
р 14
 
6.8%
е 14
 
6.8%
и 12
 
5.9%
н 12
 
5.9%
л 10
 
4.9%
ь 9
 
4.4%
к 9
 
4.4%
в 8
 
3.9%
Other values (33) 79
38.5%
Greek
ValueCountFrequency (%)
α 17
 
11.3%
ι 13
 
8.7%
ο 11
 
7.3%
τ 8
 
5.3%
ρ 7
 
4.7%
λ 7
 
4.7%
ν 6
 
4.0%
ά 6
 
4.0%
η 6
 
4.0%
ς 6
 
4.0%
Other values (29) 63
42.0%
Katakana
ValueCountFrequency (%)
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
Arabic
ValueCountFrequency (%)
چ 2
18.2%
ک 2
18.2%
ی 2
18.2%
ه 2
18.2%
س 1
9.1%
ا 1
9.1%
ج 1
9.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 712371
99.8%
None 1043
 
0.1%
Cyrillic 205
 
< 0.1%
Punctuation 59
 
< 0.1%
Arabic 11
 
< 0.1%
Katakana 8
 
< 0.1%
Misc Symbols 3
 
< 0.1%
Letterlike Symbols 1
 
< 0.1%
Number Forms 1
 
< 0.1%
Math Operators 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
85309
 
12.0%
e 71975
 
10.1%
a 46149
 
6.5%
o 43004
 
6.0%
n 38378
 
5.4%
r 37757
 
5.3%
i 37369
 
5.2%
t 34555
 
4.9%
s 27860
 
3.9%
h 26970
 
3.8%
Other values (76) 263045
36.9%
None
ValueCountFrequency (%)
é 205
19.7%
ä 112
 
10.7%
ö 53
 
5.1%
è 49
 
4.7%
ô 44
 
4.2%
ü 36
 
3.5%
ó 35
 
3.4%
á 34
 
3.3%
ı 32
 
3.1%
í 32
 
3.1%
Other values (104) 411
39.4%
Punctuation
ValueCountFrequency (%)
35
59.3%
15
25.4%
4
 
6.8%
2
 
3.4%
1
 
1.7%
1
 
1.7%
1
 
1.7%
Cyrillic
ValueCountFrequency (%)
о 21
 
10.2%
а 17
 
8.3%
р 14
 
6.8%
е 14
 
6.8%
и 12
 
5.9%
н 12
 
5.9%
л 10
 
4.9%
ь 9
 
4.4%
к 9
 
4.4%
в 8
 
3.9%
Other values (33) 79
38.5%
Arabic
ValueCountFrequency (%)
چ 2
18.2%
ک 2
18.2%
ی 2
18.2%
ه 2
18.2%
س 1
9.1%
ا 1
9.1%
ج 1
9.1%
Misc Symbols
ValueCountFrequency (%)
2
66.7%
1
33.3%
Katakana
ValueCountFrequency (%)
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
Letterlike Symbols
ValueCountFrequency (%)
1
100.0%
Number Forms
ValueCountFrequency (%)
1
100.0%
Math Operators
ValueCountFrequency (%)
1
100.0%

vote_average
Real number (ℝ)

Distinct92
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.6708959
Minimum0
Maximum10
Zeros2369
Zeros (%)5.5%
Negative0
Negative (%)0.0%
Memory size336.8 KiB
2023-06-08T14:00:28.917844image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q15
median6
Q36.8
95-th percentile7.8
Maximum10
Range10
Interquartile range (IQR)1.8

Descriptive statistics

Standard deviation1.8229637
Coefficient of variation (CV)0.32145956
Kurtosis2.9701871
Mean5.6708959
Median Absolute Deviation (MAD)0.8
Skewness-1.5708793
Sum244398.6
Variance3.3231967
MonotonicityNot monotonic
2023-06-08T14:00:29.218907image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 2369
 
5.5%
6 2332
 
5.4%
5 1810
 
4.2%
7 1716
 
4.0%
6.5 1651
 
3.8%
6.3 1553
 
3.6%
5.8 1354
 
3.1%
5.5 1331
 
3.1%
6.4 1321
 
3.1%
6.7 1313
 
3.0%
Other values (82) 26347
61.1%
ValueCountFrequency (%)
0 2369
5.5%
0.5 13
 
< 0.1%
0.7 1
 
< 0.1%
1 90
 
0.2%
1.1 1
 
< 0.1%
1.2 4
 
< 0.1%
1.3 13
 
< 0.1%
1.4 5
 
< 0.1%
1.5 28
 
0.1%
1.6 6
 
< 0.1%
ValueCountFrequency (%)
10 153
0.4%
9.8 1
 
< 0.1%
9.6 1
 
< 0.1%
9.5 17
 
< 0.1%
9.4 2
 
< 0.1%
9.3 17
 
< 0.1%
9.2 4
 
< 0.1%
9.1 2
 
< 0.1%
9 125
0.3%
8.9 7
 
< 0.1%

vote_count
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct1820
Distinct (%)4.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean115.72694
Minimum0
Maximum14075
Zeros2277
Zeros (%)5.3%
Negative0
Negative (%)0.0%
Memory size336.8 KiB
2023-06-08T14:00:29.499374image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q14
median11
Q337
95-th percentile464
Maximum14075
Range14075
Interquartile range (IQR)33

Descriptive statistics

Standard deviation503.9498
Coefficient of variation (CV)4.3546455
Kurtosis143.47233
Mean115.72694
Median Absolute Deviation (MAD)9
Skewness10.180276
Sum4987484
Variance253965.4
MonotonicityNot monotonic
2023-06-08T14:00:29.779424image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2 2797
 
6.5%
1 2788
 
6.5%
3 2564
 
5.9%
4 2312
 
5.4%
0 2277
 
5.3%
5 1983
 
4.6%
6 1680
 
3.9%
7 1505
 
3.5%
8 1309
 
3.0%
9 1161
 
2.7%
Other values (1810) 22721
52.7%
ValueCountFrequency (%)
0 2277
5.3%
1 2788
6.5%
2 2797
6.5%
3 2564
5.9%
4 2312
5.4%
5 1983
4.6%
6 1680
3.9%
7 1505
3.5%
8 1309
3.0%
9 1161
2.7%
ValueCountFrequency (%)
14075 1
< 0.1%
12269 1
< 0.1%
12114 1
< 0.1%
12000 1
< 0.1%
11444 1
< 0.1%
11187 1
< 0.1%
10297 1
< 0.1%
10014 1
< 0.1%
9678 1
< 0.1%
9634 1
< 0.1%

release_year
Real number (ℝ)

Distinct132
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1991.6549
Minimum1878
Maximum2020
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size336.8 KiB
2023-06-08T14:00:30.068338image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum1878
5-th percentile1941
Q11978
median2001
Q32010
95-th percentile2015
Maximum2020
Range142
Interquartile range (IQR)32

Descriptive statistics

Standard deviation23.912476
Coefficient of variation (CV)0.012006335
Kurtosis0.5772303
Mean1991.6549
Median Absolute Deviation (MAD)12
Skewness-1.1591477
Sum85834350
Variance571.80649
MonotonicityNot monotonic
2023-06-08T14:00:30.361650image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2014 1812
 
4.2%
2015 1786
 
4.1%
2013 1768
 
4.1%
2012 1584
 
3.7%
2011 1563
 
3.6%
2016 1523
 
3.5%
2009 1488
 
3.5%
2010 1414
 
3.3%
2008 1371
 
3.2%
2007 1228
 
2.8%
Other values (122) 27560
63.9%
ValueCountFrequency (%)
1878 1
 
< 0.1%
1888 1
 
< 0.1%
1890 3
 
< 0.1%
1891 2
 
< 0.1%
1892 1
 
< 0.1%
1893 1
 
< 0.1%
1894 10
< 0.1%
1895 5
< 0.1%
1896 8
< 0.1%
1897 4
 
< 0.1%
ValueCountFrequency (%)
2020 1
 
< 0.1%
2018 5
 
< 0.1%
2017 509
 
1.2%
2016 1523
3.5%
2015 1786
4.1%
2014 1812
4.2%
2013 1768
4.1%
2012 1584
3.7%
2011 1563
3.6%
2010 1414
3.3%

return
Real number (ℝ)

HIGH CORRELATION  SKEWED  ZEROS 

Distinct5221
Distinct (%)12.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean694.94573
Minimum0
Maximum12396383
Zeros37715
Zeros (%)87.5%
Negative0
Negative (%)0.0%
Memory size336.8 KiB
2023-06-08T14:00:30.688498image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile2.6864717
Maximum12396383
Range12396383
Interquartile range (IQR)0

Descriptive statistics

Standard deviation76642.657
Coefficient of variation (CV)110.28582
Kurtosis19634.542
Mean694.94573
Median Absolute Deviation (MAD)0
Skewness134.8107
Sum29950076
Variance5.8740969 × 109
MonotonicityNot monotonic
2023-06-08T14:00:30.988904image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 37715
87.5%
1 20
 
< 0.1%
2 12
 
< 0.1%
4 11
 
< 0.1%
5 8
 
< 0.1%
3 7
 
< 0.1%
2.5 7
 
< 0.1%
1.333333333 7
 
< 0.1%
1.5 6
 
< 0.1%
0.13615 4
 
< 0.1%
Other values (5211) 5300
 
12.3%
ValueCountFrequency (%)
0 37715
87.5%
5.217391304 × 10-71
 
< 0.1%
7.5 × 10-71
 
< 0.1%
9.375 × 10-71
 
< 0.1%
1.499133126 × 10-61
 
< 0.1%
1.8 × 10-61
 
< 0.1%
1.916666667 × 10-61
 
< 0.1%
3.5 × 10-61
 
< 0.1%
4 × 10-61
 
< 0.1%
5.111111111 × 10-61
 
< 0.1%
ValueCountFrequency (%)
12396383 1
< 0.1%
8500000 1
< 0.1%
4197476.625 1
< 0.1%
2755584 1
< 0.1%
1018619.283 1
< 0.1%
1000000 1
< 0.1%
26881.72043 1
< 0.1%
12890.38667 1
< 0.1%
5330.33945 1
< 0.1%
4133.333333 1
< 0.1%

ActorNames
Categorical

HIGH CARDINALITY  UNIFORM 

Distinct42656
Distinct (%)99.0%
Missing0
Missing (%)0.0%
Memory size336.8 KiB
Georges Méliès
 
24
Louis Theroux
 
15
Mel Blanc
 
12
Petteri Summanen, Ismo Kallio, Eppu Salminen, Irina Björklund, Hannu-Pekka Björkman, Jenni Banerjee, Mikko Leppilampi, Lena Meriläinen, Mari Perankoski, Risto Kaskilahti
 
9
Jimmy Carr
 
9
Other values (42651)
43028 

Length

Max length4551
Median length1364
Mean length197.98868
Min length4

Characters and Unicode

Total characters8532718
Distinct characters395
Distinct categories16 ?
Distinct scripts9 ?
Distinct blocks10 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique42451 ?
Unique (%)98.5%

Sample

1st rowTom Hanks, Tim Allen, Don Rickles, Jim Varney, Wallace Shawn, John Ratzenberger, Annie Potts, John Morris, Erik von Detten, Laurie Metcalf, R. Lee Ermey, Sarah Freeman, Penn Jillette
2nd rowRobin Williams, Jonathan Hyde, Kirsten Dunst, Bradley Pierce, Bonnie Hunt, Bebe Neuwirth, David Alan Grier, Patricia Clarkson, Adam Hann-Byrd, Laura Bell Bundy, James Handy, Gillian Barber, Brandon Obray, Cyrus Thiedeke, Gary Joseph Thorup, Leonard Zola, Lloyd Berry, Malcolm Stewart, Annabel Kershaw, Darryl Henriques, Robyn Driscoll, Peter Bryant, Sarah Gilson, Florica Vlad, June Lion, Brenda Lockmuller
3rd rowWalter Matthau, Jack Lemmon, Ann-Margret, Sophia Loren, Daryl Hannah, Burgess Meredith, Kevin Pollak
4th rowWhitney Houston, Angela Bassett, Loretta Devine, Lela Rochon, Gregory Hines, Dennis Haysbert, Michael Beach, Mykelti Williamson, Lamont Johnson, Wesley Snipes
5th rowSteve Martin, Diane Keaton, Martin Short, Kimberly Williams-Paisley, George Newbern, Kieran Culkin, BD Wong, Peter Michael Goetz, Kate McGregor-Stewart, Jane Adams, Eugene Levy, Lori Alan

Common Values

ValueCountFrequency (%)
Georges Méliès 24
 
0.1%
Louis Theroux 15
 
< 0.1%
Mel Blanc 12
 
< 0.1%
Petteri Summanen, Ismo Kallio, Eppu Salminen, Irina Björklund, Hannu-Pekka Björkman, Jenni Banerjee, Mikko Leppilampi, Lena Meriläinen, Mari Perankoski, Risto Kaskilahti 9
 
< 0.1%
Jimmy Carr 9
 
< 0.1%
Werner Herzog 8
 
< 0.1%
Louis C.K. 8
 
< 0.1%
George Carlin 8
 
< 0.1%
David Attenborough 8
 
< 0.1%
Jim Jefferies 6
 
< 0.1%
Other values (42646) 42990
99.8%

Length

2023-06-08T14:00:31.322916image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
john 9805
 
0.8%
michael 7470
 
0.6%
david 6193
 
0.5%
robert 5734
 
0.5%
james 5693
 
0.5%
richard 4446
 
0.4%
paul 4323
 
0.4%
peter 3903
 
0.3%
william 3432
 
0.3%
george 3419
 
0.3%
Other values (112933) 1112239
95.3%

Most occurring characters

ValueCountFrequency (%)
1123688
 
13.2%
a 706002
 
8.3%
e 666306
 
7.8%
n 524946
 
6.2%
, 520208
 
6.1%
r 498143
 
5.8%
i 484754
 
5.7%
o 424335
 
5.0%
l 366950
 
4.3%
s 256209
 
3.0%
Other values (385) 2961177
34.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 5659408
66.3%
Uppercase Letter 1192120
 
14.0%
Space Separator 1123691
 
13.2%
Other Punctuation 542538
 
6.4%
Dash Punctuation 14137
 
0.2%
Other Letter 543
 
< 0.1%
Decimal Number 94
 
< 0.1%
Final Punctuation 83
 
< 0.1%
Initial Punctuation 23
 
< 0.1%
Open Punctuation 23
 
< 0.1%
Other values (6) 58
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 706002
12.5%
e 666306
11.8%
n 524946
9.3%
r 498143
 
8.8%
i 484754
 
8.6%
o 424335
 
7.5%
l 366950
 
6.5%
s 256209
 
4.5%
t 253591
 
4.5%
h 198191
 
3.5%
Other values (138) 1279981
22.6%
Other Letter
ValueCountFrequency (%)
ا 32
 
5.9%
م 31
 
5.7%
ی 19
 
3.5%
ع 19
 
3.5%
ن 18
 
3.3%
د 17
 
3.1%
17
 
3.1%
ر 17
 
3.1%
ي 16
 
2.9%
12
 
2.2%
Other values (104) 345
63.5%
Uppercase Letter
ValueCountFrequency (%)
M 109546
 
9.2%
S 92448
 
7.8%
C 84111
 
7.1%
J 83437
 
7.0%
B 82503
 
6.9%
A 70925
 
5.9%
R 67478
 
5.7%
D 65952
 
5.5%
L 61244
 
5.1%
G 54738
 
4.6%
Other values (81) 419738
35.2%
Decimal Number
ValueCountFrequency (%)
5 37
39.4%
0 29
30.9%
1 8
 
8.5%
2 8
 
8.5%
9 4
 
4.3%
3 2
 
2.1%
7 2
 
2.1%
4 2
 
2.1%
8 1
 
1.1%
6 1
 
1.1%
Other Punctuation
ValueCountFrequency (%)
, 520208
95.9%
. 16076
 
3.0%
' 6098
 
1.1%
" 129
 
< 0.1%
· 9
 
< 0.1%
: 6
 
< 0.1%
& 6
 
< 0.1%
! 5
 
< 0.1%
/ 1
 
< 0.1%
Nonspacing Mark
ValueCountFrequency (%)
́ 10
58.8%
2
 
11.8%
1
 
5.9%
1
 
5.9%
1
 
5.9%
1
 
5.9%
1
 
5.9%
Final Punctuation
ValueCountFrequency (%)
74
89.2%
6
 
7.2%
» 3
 
3.6%
Space Separator
ValueCountFrequency (%)
1123688
> 99.9%
  3
 
< 0.1%
Initial Punctuation
ValueCountFrequency (%)
20
87.0%
« 3
 
13.0%
Open Punctuation
ValueCountFrequency (%)
14
60.9%
( 9
39.1%
Format
ValueCountFrequency (%)
5
83.3%
1
 
16.7%
Dash Punctuation
ValueCountFrequency (%)
- 14137
100.0%
Control
ValueCountFrequency (%)
21
100.0%
Close Punctuation
ValueCountFrequency (%)
) 9
100.0%
Currency Symbol
ValueCountFrequency (%)
$ 3
100.0%
Modifier Symbol
ValueCountFrequency (%)
´ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 6848444
80.3%
Common 1680629
 
19.7%
Cyrillic 3070
 
< 0.1%
Han 276
 
< 0.1%
Arabic 241
 
< 0.1%
Thai 27
 
< 0.1%
Greek 14
 
< 0.1%
Inherited 11
 
< 0.1%
Hangul 6
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 706002
 
10.3%
e 666306
 
9.7%
n 524946
 
7.7%
r 498143
 
7.3%
i 484754
 
7.1%
o 424335
 
6.2%
l 366950
 
5.4%
s 256209
 
3.7%
t 253591
 
3.7%
h 198191
 
2.9%
Other values (163) 2469017
36.1%
Han
ValueCountFrequency (%)
17
 
6.2%
12
 
4.3%
11
 
4.0%
11
 
4.0%
11
 
4.0%
11
 
4.0%
11
 
4.0%
11
 
4.0%
9
 
3.3%
9
 
3.3%
Other values (55) 163
59.1%
Cyrillic
ValueCountFrequency (%)
а 323
 
10.5%
и 315
 
10.3%
о 233
 
7.6%
н 229
 
7.5%
р 215
 
7.0%
е 174
 
5.7%
л 155
 
5.0%
к 136
 
4.4%
т 115
 
3.7%
с 109
 
3.6%
Other values (51) 1066
34.7%
Common
ValueCountFrequency (%)
1123688
66.9%
, 520208
31.0%
. 16076
 
1.0%
- 14137
 
0.8%
' 6098
 
0.4%
" 129
 
< 0.1%
74
 
< 0.1%
5 37
 
< 0.1%
0 29
 
< 0.1%
21
 
< 0.1%
Other values (24) 132
 
< 0.1%
Arabic
ValueCountFrequency (%)
ا 32
13.3%
م 31
12.9%
ی 19
 
7.9%
ع 19
 
7.9%
ن 18
 
7.5%
د 17
 
7.1%
ر 17
 
7.1%
ي 16
 
6.6%
ل 9
 
3.7%
ب 8
 
3.3%
Other values (18) 55
22.8%
Thai
ValueCountFrequency (%)
2
 
7.4%
2
 
7.4%
2
 
7.4%
2
 
7.4%
2
 
7.4%
2
 
7.4%
1
 
3.7%
1
 
3.7%
1
 
3.7%
1
 
3.7%
Other values (11) 11
40.7%
Hangul
ValueCountFrequency (%)
1
16.7%
1
16.7%
1
16.7%
1
16.7%
1
16.7%
1
16.7%
Greek
ValueCountFrequency (%)
ν 6
42.9%
ί 2
 
14.3%
Ζ 2
 
14.3%
α 2
 
14.3%
ο 2
 
14.3%
Inherited
ValueCountFrequency (%)
́ 10
90.9%
1
 
9.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 8490576
99.5%
None 38336
 
0.4%
Cyrillic 3070
 
< 0.1%
CJK 276
 
< 0.1%
Arabic 241
 
< 0.1%
Punctuation 120
 
< 0.1%
Latin Ext Additional 56
 
< 0.1%
Thai 27
 
< 0.1%
Diacriticals 10
 
< 0.1%
Hangul 6
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
1123688
 
13.2%
a 706002
 
8.3%
e 666306
 
7.8%
n 524946
 
6.2%
, 520208
 
6.1%
r 498143
 
5.9%
i 484754
 
5.7%
o 424335
 
5.0%
l 366950
 
4.3%
s 256209
 
3.0%
Other values (66) 2919035
34.4%
None
ValueCountFrequency (%)
é 9105
23.8%
á 4156
 
10.8%
í 2756
 
7.2%
ô 2334
 
6.1%
ö 2039
 
5.3%
ó 1881
 
4.9%
ü 1497
 
3.9%
ć 1360
 
3.5%
è 1243
 
3.2%
ä 1002
 
2.6%
Other values (111) 10963
28.6%
Cyrillic
ValueCountFrequency (%)
а 323
 
10.5%
и 315
 
10.3%
о 233
 
7.6%
н 229
 
7.5%
р 215
 
7.0%
е 174
 
5.7%
л 155
 
5.0%
к 136
 
4.4%
т 115
 
3.7%
с 109
 
3.6%
Other values (51) 1066
34.7%
Punctuation
ValueCountFrequency (%)
74
61.7%
20
 
16.7%
14
 
11.7%
6
 
5.0%
5
 
4.2%
1
 
0.8%
Arabic
ValueCountFrequency (%)
ا 32
13.3%
م 31
12.9%
ی 19
 
7.9%
ع 19
 
7.9%
ن 18
 
7.5%
د 17
 
7.1%
ر 17
 
7.1%
ي 16
 
6.6%
ل 9
 
3.7%
ب 8
 
3.3%
Other values (18) 55
22.8%
CJK
ValueCountFrequency (%)
17
 
6.2%
12
 
4.3%
11
 
4.0%
11
 
4.0%
11
 
4.0%
11
 
4.0%
11
 
4.0%
11
 
4.0%
9
 
3.3%
9
 
3.3%
Other values (55) 163
59.1%
Latin Ext Additional
ValueCountFrequency (%)
15
26.8%
9
16.1%
6
 
10.7%
6
 
10.7%
ế 5
 
8.9%
4
 
7.1%
4
 
7.1%
4
 
7.1%
2
 
3.6%
1
 
1.8%
Diacriticals
ValueCountFrequency (%)
́ 10
100.0%
Thai
ValueCountFrequency (%)
2
 
7.4%
2
 
7.4%
2
 
7.4%
2
 
7.4%
2
 
7.4%
2
 
7.4%
1
 
3.7%
1
 
3.7%
1
 
3.7%
1
 
3.7%
Other values (11) 11
40.7%
Hangul
ValueCountFrequency (%)
1
16.7%
1
16.7%
1
16.7%
1
16.7%
1
16.7%
1
16.7%

DirectorNames
Categorical

HIGH CARDINALITY  MISSING 

Distinct17738
Distinct (%)41.6%
Missing444
Missing (%)1.0%
Memory size336.8 KiB
John Ford
 
63
Michael Curtiz
 
61
Alfred Hitchcock
 
52
Werner Herzog
 
47
Woody Allen
 
47
Other values (17733)
42383 

Length

Max length654
Median length468
Mean length14.937636
Min length2

Characters and Unicode

Total characters637135
Distinct characters202
Distinct categories11 ?
Distinct scripts6 ?
Distinct blocks7 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique11279 ?
Unique (%)26.4%

Sample

1st rowJohn Lasseter
2nd rowJoe Johnston
3rd rowHoward Deutch
4th rowForest Whitaker
5th rowCharles Shyer

Common Values

ValueCountFrequency (%)
John Ford 63
 
0.1%
Michael Curtiz 61
 
0.1%
Alfred Hitchcock 52
 
0.1%
Werner Herzog 47
 
0.1%
Woody Allen 47
 
0.1%
Sidney Lumet 45
 
0.1%
Charlie Chaplin 43
 
0.1%
William A. Wellman 41
 
0.1%
Henry Hathaway 41
 
0.1%
Richard Thorpe 40
 
0.1%
Other values (17728) 42173
97.9%
(Missing) 444
 
1.0%

Length

2023-06-08T14:00:31.671269image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
john 1200
 
1.2%
michael 906
 
0.9%
david 858
 
0.9%
robert 831
 
0.8%
peter 560
 
0.6%
william 535
 
0.5%
richard 531
 
0.5%
james 480
 
0.5%
paul 460
 
0.5%
george 430
 
0.4%
Other values (17697) 91393
93.1%

Most occurring characters

ValueCountFrequency (%)
55621
 
8.7%
e 54772
 
8.6%
a 54169
 
8.5%
r 42687
 
6.7%
n 42156
 
6.6%
i 40928
 
6.4%
o 37102
 
5.8%
l 28878
 
4.5%
s 21718
 
3.4%
t 20712
 
3.3%
Other values (192) 238392
37.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 472893
74.2%
Uppercase Letter 99940
 
15.7%
Space Separator 55621
 
8.7%
Other Punctuation 7299
 
1.1%
Dash Punctuation 1336
 
0.2%
Other Letter 23
 
< 0.1%
Control 12
 
< 0.1%
Decimal Number 6
 
< 0.1%
Open Punctuation 2
 
< 0.1%
Close Punctuation 2
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 54772
11.6%
a 54169
11.5%
r 42687
 
9.0%
n 42156
 
8.9%
i 40928
 
8.7%
o 37102
 
7.8%
l 28878
 
6.1%
s 21718
 
4.6%
t 20712
 
4.4%
h 17561
 
3.7%
Other values (96) 112210
23.7%
Uppercase Letter
ValueCountFrequency (%)
M 8802
 
8.8%
S 8331
 
8.3%
J 7488
 
7.5%
R 6425
 
6.4%
C 6288
 
6.3%
B 6207
 
6.2%
A 6003
 
6.0%
D 5307
 
5.3%
L 5158
 
5.2%
G 4791
 
4.8%
Other values (52) 35140
35.2%
Other Letter
ValueCountFrequency (%)
م 2
 
8.7%
ی 2
 
8.7%
ا 2
 
8.7%
1
 
4.3%
1
 
4.3%
د 1
 
4.3%
1
 
4.3%
ع 1
 
4.3%
1
 
4.3%
پ 1
 
4.3%
Other values (10) 10
43.5%
Other Punctuation
ValueCountFrequency (%)
, 4100
56.2%
. 2987
40.9%
' 211
 
2.9%
· 1
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
0 3
50.0%
5 1
 
16.7%
9 1
 
16.7%
3 1
 
16.7%
Space Separator
ValueCountFrequency (%)
55621
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1336
100.0%
Control
ValueCountFrequency (%)
12
100.0%
Open Punctuation
ValueCountFrequency (%)
( 2
100.0%
Close Punctuation
ValueCountFrequency (%)
) 2
100.0%
Math Symbol
ValueCountFrequency (%)
| 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 572659
89.9%
Common 64279
 
10.1%
Cyrillic 174
 
< 0.1%
Arabic 10
 
< 0.1%
Han 10
 
< 0.1%
Hangul 3
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 54772
 
9.6%
a 54169
 
9.5%
r 42687
 
7.5%
n 42156
 
7.4%
i 40928
 
7.1%
o 37102
 
6.5%
l 28878
 
5.0%
s 21718
 
3.8%
t 20712
 
3.6%
h 17561
 
3.1%
Other values (123) 211976
37.0%
Cyrillic
ValueCountFrequency (%)
и 21
12.1%
о 15
 
8.6%
л 13
 
7.5%
а 12
 
6.9%
к 12
 
6.9%
е 12
 
6.9%
р 11
 
6.3%
н 11
 
6.3%
д 9
 
5.2%
й 6
 
3.4%
Other values (25) 52
29.9%
Common
ValueCountFrequency (%)
55621
86.5%
, 4100
 
6.4%
. 2987
 
4.6%
- 1336
 
2.1%
' 211
 
0.3%
12
 
< 0.1%
0 3
 
< 0.1%
( 2
 
< 0.1%
) 2
 
< 0.1%
5 1
 
< 0.1%
Other values (4) 4
 
< 0.1%
Han
ValueCountFrequency (%)
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
Arabic
ValueCountFrequency (%)
م 2
20.0%
ی 2
20.0%
ا 2
20.0%
د 1
10.0%
ع 1
10.0%
پ 1
10.0%
ن 1
10.0%
Hangul
ValueCountFrequency (%)
1
33.3%
1
33.3%
1
33.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII 632998
99.4%
None 3937
 
0.6%
Cyrillic 174
 
< 0.1%
Arabic 10
 
< 0.1%
CJK 10
 
< 0.1%
Latin Ext Additional 3
 
< 0.1%
Hangul 3
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
55621
 
8.8%
e 54772
 
8.7%
a 54169
 
8.6%
r 42687
 
6.7%
n 42156
 
6.7%
i 40928
 
6.5%
o 37102
 
5.9%
l 28878
 
4.6%
s 21718
 
3.4%
t 20712
 
3.3%
Other values (55) 234255
37.0%
None
ValueCountFrequency (%)
é 930
23.6%
á 382
 
9.7%
ö 256
 
6.5%
í 244
 
6.2%
ó 218
 
5.5%
ô 158
 
4.0%
ä 146
 
3.7%
è 126
 
3.2%
ü 114
 
2.9%
ç 107
 
2.7%
Other values (69) 1256
31.9%
Cyrillic
ValueCountFrequency (%)
и 21
12.1%
о 15
 
8.6%
л 13
 
7.5%
а 12
 
6.9%
к 12
 
6.9%
е 12
 
6.9%
р 11
 
6.3%
н 11
 
6.3%
д 9
 
5.2%
й 6
 
3.4%
Other values (25) 52
29.9%
Arabic
ValueCountFrequency (%)
م 2
20.0%
ی 2
20.0%
ا 2
20.0%
د 1
10.0%
ع 1
10.0%
پ 1
10.0%
ن 1
10.0%
Latin Ext Additional
ValueCountFrequency (%)
1
33.3%
1
33.3%
1
33.3%
CJK
ValueCountFrequency (%)
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
Hangul
ValueCountFrequency (%)
1
33.3%
1
33.3%
1
33.3%

Interactions

2023-06-08T14:00:14.952799image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T13:59:53.428746image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T13:59:56.542217image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T13:59:58.752629image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:01.059435image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:03.608929image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:06.026481image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:09.440778image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:12.377892image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:15.210102image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T13:59:53.882383image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T13:59:56.795973image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T13:59:59.019492image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:01.315059image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:03.860061image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:06.439225image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:09.861130image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:12.648611image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:15.455282image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T13:59:54.154649image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T13:59:57.032996image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T13:59:59.246970image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:01.555948image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:04.105875image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:06.776134image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:10.231457image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:13.145277image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:15.706784image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T13:59:54.559050image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T13:59:57.263052image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T13:59:59.477923image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:01.826310image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:04.350742image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:07.163657image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:10.617255image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:13.401694image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:15.949467image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T13:59:54.981979image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T13:59:57.498750image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T13:59:59.727598image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:02.324603image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:04.593357image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:07.563373image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:11.033792image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:13.652330image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:16.201522image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T13:59:55.372586image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T13:59:57.739119image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T13:59:59.981230image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:02.584802image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:04.844106image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:07.956495image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:11.314812image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:13.911933image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:16.442361image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T13:59:55.728107image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T13:59:57.982895image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:00.265625image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:02.831726image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:05.083564image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:08.312281image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:11.577280image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:14.161906image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:16.711185image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T13:59:56.007889image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T13:59:58.242110image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:00.532834image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:03.101207image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:05.361427image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:08.724703image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:11.840824image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:14.426567image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:16.974933image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T13:59:56.282870image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T13:59:58.503823image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:00.793945image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:03.355926image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:05.634156image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:09.090375image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:12.106779image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-08T14:00:14.699503image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Correlations

2023-06-08T14:00:31.919545image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
budgetidpopularityrevenueruntimevote_averagevote_countrelease_yearreturnoriginal_languagestatus
budget1.000-0.2510.4650.6460.2180.0720.4880.1530.7760.0000.000
id-0.2511.000-0.395-0.275-0.182-0.147-0.4190.383-0.2600.0740.056
popularity0.465-0.3951.0000.4960.2760.2340.8910.2130.4520.0000.000
revenue0.646-0.2750.4961.0000.2470.1300.5210.1130.8530.0000.000
runtime0.218-0.1820.2760.2471.0000.2000.2660.0430.2280.1120.000
vote_average0.072-0.1470.2340.1300.2001.0000.308-0.0110.1230.0720.021
vote_count0.488-0.4190.8910.5210.2660.3081.0000.2260.4810.0000.000
release_year0.1530.3830.2130.1130.043-0.0110.2261.0000.0950.1450.028
return0.776-0.2600.4520.8530.2280.1230.4810.0951.0000.0000.000
original_language0.0000.0740.0000.0000.1120.0720.0000.1450.0001.0000.000
status0.0000.0560.0000.0000.0000.0210.0000.0280.0000.0001.000

Missing values

2023-06-08T14:00:17.449446image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
A simple visualization of nullity by column.
2023-06-08T14:00:18.249848image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-06-08T14:00:19.070880image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

belongs_to_collectionbudgetgenresidoriginal_languageoverviewpopularityproduction_companiesproduction_countriesrelease_daterevenueruntimespoken_languagesstatustaglinetitlevote_averagevote_countrelease_yearreturnActorNamesDirectorNames
0Toy Story Collection30000000Animation, Comedy, Family862enLed by Woody, Andy's toys live happily in his room until Andy's birthday brings Buzz Lightyear onto the scene. Afraid of losing his place in Andy's heart, Woody plots against Buzz. But when circumstances separate Buzz and Woody from their owner, the duo eventually learns to put aside their differences.21.946943Pixar Animation StudiosUnited States of America1995-10-30373554033.081.0EnglishReleasedNaNToy Story7.75415.0199512.451801Tom Hanks, Tim Allen, Don Rickles, Jim Varney, Wallace Shawn, John Ratzenberger, Annie Potts, John Morris, Erik von Detten, Laurie Metcalf, R. Lee Ermey, Sarah Freeman, Penn JilletteJohn Lasseter
1NaN65000000Adventure, Fantasy, Family8844enWhen siblings Judy and Peter discover an enchanted board game that opens the door to a magical world, they unwittingly invite Alan -- an adult who's been trapped inside the game for 26 years -- into their living room. Alan's only hope for freedom is to finish the game, which proves risky as all three find themselves running from giant rhinoceroses, evil monkeys and other terrifying creatures.17.015539TriStar Pictures, Teitler Film, Interscope CommunicationsUnited States of America1995-12-15262797249.0104.0English, FrançaisReleasedRoll the dice and unleash the excitement!Jumanji6.92413.019954.043035Robin Williams, Jonathan Hyde, Kirsten Dunst, Bradley Pierce, Bonnie Hunt, Bebe Neuwirth, David Alan Grier, Patricia Clarkson, Adam Hann-Byrd, Laura Bell Bundy, James Handy, Gillian Barber, Brandon Obray, Cyrus Thiedeke, Gary Joseph Thorup, Leonard Zola, Lloyd Berry, Malcolm Stewart, Annabel Kershaw, Darryl Henriques, Robyn Driscoll, Peter Bryant, Sarah Gilson, Florica Vlad, June Lion, Brenda LockmullerJoe Johnston
2Grumpy Old Men Collection0Romance, Comedy15602enA family wedding reignites the ancient feud between next-door neighbors and fishing buddies John and Max. Meanwhile, a sultry Italian divorcée opens a restaurant at the local bait shop, alarming the locals who worry she'll scare the fish away. But she's less interested in seafood than she is in cooking up a hot time with Max.11.712900Warner Bros., Lancaster GateUnited States of America1995-12-220.0101.0EnglishReleasedStill Yelling. Still Fighting. Still Ready for Love.Grumpier Old Men6.592.019950.000000Walter Matthau, Jack Lemmon, Ann-Margret, Sophia Loren, Daryl Hannah, Burgess Meredith, Kevin PollakHoward Deutch
3NaN16000000Comedy, Drama, Romance31357enCheated on, mistreated and stepped on, the women are holding their breath, waiting for the elusive "good man" to break a string of less-than-stellar lovers. Friends and confidants Vannah, Bernie, Glo and Robin talk it all out, determined to find a better way to breathe.3.859495Twentieth Century Fox Film CorporationUnited States of America1995-12-2281452156.0127.0EnglishReleasedFriends are the people who let you be yourself... and never let you forget it.Waiting to Exhale6.134.019955.090760Whitney Houston, Angela Bassett, Loretta Devine, Lela Rochon, Gregory Hines, Dennis Haysbert, Michael Beach, Mykelti Williamson, Lamont Johnson, Wesley SnipesForest Whitaker
4Father of the Bride Collection0Comedy11862enJust when George Banks has recovered from his daughter's wedding, he receives the news that she's pregnant ... and that George's wife, Nina, is expecting too. He was planning on selling their home, but that's a plan that -- like George -- will have to change with the arrival of both a grandchild and a kid of his own.8.387519Sandollar Productions, Touchstone PicturesUnited States of America1995-02-1076578911.0106.0EnglishReleasedJust When His World Is Back To Normal... He's In For The Surprise Of His Life!Father of the Bride Part II5.7173.019950.000000Steve Martin, Diane Keaton, Martin Short, Kimberly Williams-Paisley, George Newbern, Kieran Culkin, BD Wong, Peter Michael Goetz, Kate McGregor-Stewart, Jane Adams, Eugene Levy, Lori AlanCharles Shyer
5NaN60000000Action, Crime, Drama, Thriller949enObsessive master thief, Neil McCauley leads a top-notch crew on various insane heists throughout Los Angeles while a mentally unstable detective, Vincent Hanna pursues him without rest. Each man recognizes and respects the ability and the dedication of the other even though they are aware their cat-and-mouse game may end in violence.17.924927Regency Enterprises, Forward Pass, Warner Bros.United States of America1995-12-15187436818.0170.0English, EspañolReleasedA Los Angeles Crime SagaHeat7.71886.019953.123947Al Pacino, Robert De Niro, Val Kilmer, Jon Voight, Tom Sizemore, Diane Venora, Amy Brenneman, Ashley Judd, Mykelti Williamson, Natalie Portman, Ted Levine, Tom Noonan, Tone Loc, Hank Azaria, Wes Studi, Dennis Haysbert, Danny Trejo, Henry Rollins, William Fichtner, Kevin Gage, Susan Traylor, Jerry Trimble, Ricky Harris, Jeremy Piven, Xander Berkeley, Begonya Plaza, Rick Avery, Hazelle Goodman, Ray Buktenica, Max Daniels, Vince Deadrick Jr., Steven Ford, Farrah Forke, Patricia Healy, Paul Herman, Cindy Katz, Brian Libby, Dan Martin, Mario Roberts, Thomas Rosales, Jr., Yvonne Zima, Mick Gould, Bud Cort, Viviane Vives, Kim Staunton, Martin Ferrero, Brad Baldridge, Andrew Camuccio, Kenny Endoso, Kimberly Flynn, Niki Harris, Bill McIntosh, Rick Marzan, Terry Miller, Daniel O'Haco, Kai Soremekun, Peter Blackwell, Trevor Coppola, Mary Kircher, Darin Mangan, Robert Miranda, Manny Perry, Iva Franks Singer, Tim Werner, Philip EttingtonMichael Mann
6NaN58000000Comedy, Romance11860enAn ugly duckling having undergone a remarkable change, still harbors feelings for her crush: a carefree playboy, but not before his business-focused brother has something to say about it.6.677277Paramount Pictures, Scott Rudin Productions, Mirage Enterprises, Sandollar Productions, Constellation Entertainment, Worldwide, Mont Blanc Entertainment GmbHGermany, United States of America1995-12-150.0127.0Français, EnglishReleasedYou are cordially invited to the most surprising merger of the year.Sabrina6.2141.019950.000000Harrison Ford, Julia Ormond, Greg Kinnear, Angie Dickinson, Nancy Marchand, John Wood, Richard Crenna, Lauren Holly, Dana Ivey, Fanny Ardant, Patrick Bruel, Paul Giamatti, Miriam Colón, Elizabeth Franz, Valérie Lemercier, Becky Ann Baker, John C. Vennema, Margo Martindale, J. Smith-Cameron, Christine Luneau-Lipton, Michael Dees, Denis Holmes, Jo-Jo Lowe, Ira Wheeler, Philippa Cooper, Ayako Kawahara, François Genty, Guillaume Gallienne, Inés Sastre, Phina Oruche, Andrea Behalikova, Jennifer Herrera, Kristina Kumlin, Eva Linderholm, Carmen Chaplin, Micheline Van de Velde, Joanna Rhodes, Alan Boone, Patrick Forster-Delmas, Kentaro Matsuo, Peter McKernan, Ed Connelly, Ronald L. Schwary, Alvin Lum, Siching Song, Phil Nee, Randy Becker, Susan Browning, Anthony Mondal, Peter Parks, Woodrow Asai, Eric Bruno Borgman, Michael Cline, Christopher Del Gaudio, Philippe Hartmann, Jerry Quinn, Dori RosenthalSydney Pollack
7NaN0Action, Adventure, Drama, Family45325enA mischievous young boy, Tom Sawyer, witnesses a murder by the deadly Injun Joe. Tom becomes friends with Huckleberry Finn, a boy with no future and no family. Tom has to choose between honoring a friendship or honoring an oath because the town alcoholic is accused of the murder. Tom and Huck go through several adventures trying to retrieve evidence.2.561161Walt Disney PicturesUnited States of America1995-12-220.097.0English, DeutschReleasedThe Original Bad Boys.Tom and Huck5.445.019950.000000Jonathan Taylor Thomas, Brad Renfro, Rachael Leigh Cook, Michael McShane, Amy Wright, Eric Schweig, Tamara MelloPeter Hewitt
8NaN35000000Action, Adventure, Thriller9091enInternational action superstar Jean Claude Van Damme teams with Powers Boothe in a Tension-packed, suspense thriller, set against the back-drop of a Stanley Cup game.Van Damme portrays a father whose daughter is suddenly taken during a championship hockey game. With the captors demanding a billion dollars by game's end, Van Damme frantically sets a plan in motion to rescue his daughter and abort an impending explosion before the final buzzer...5.231580Universal Pictures, Imperial Entertainment, Signature EntertainmentUnited States of America1995-12-2264350171.0106.0EnglishReleasedTerror goes into overtime.Sudden Death5.5174.019951.838576Jean-Claude Van Damme, Powers Boothe, Dorian Harewood, Raymond J. Barry, Ross Malinger, Whittni WrightPeter Hyams
9James Bond Collection58000000Adventure, Action, Thriller710enJames Bond must unmask the mysterious head of the Janus Syndicate and prevent the leader from utilizing the GoldenEye weapons system to inflict devastating revenge on Britain.14.686036United Artists, Eon ProductionsUnited Kingdom, United States of America1995-11-16352194034.0130.0English, Pусский, EspañolReleasedNo limits. No fears. No substitutes.GoldenEye6.61194.019956.072311Pierce Brosnan, Sean Bean, Izabella Scorupco, Famke Janssen, Joe Don Baker, Judi Dench, Gottfried John, Robbie Coltrane, Alan Cumming, Tchéky Karyo, Desmond Llewelyn, Samantha Bond, Michael Kitchen, Serena Gordon, Simon Kunz, Billy J. Mitchell, Constantine Gregory, Minnie Driver, Michelle Arthur, Ravil IsyanovMartin Campbell
belongs_to_collectionbudgetgenresidoriginal_languageoverviewpopularityproduction_companiesproduction_countriesrelease_daterevenueruntimespoken_languagesstatustaglinetitlevote_averagevote_countrelease_yearreturnActorNamesDirectorNames
43087NaN0Comedy, Drama420346enThe Morning After is a feature film that consists of 8 vignettes that are inter-cut throughout the film. The 8 vignettes are about when you wake up next to someone the next morning...0.139936Oops Doughnuts Productions, He and She FilmsUnited States of America2015-01-110.079.0EnglishReleasedWhat happened last night?The Morning After4.02.020150.0Markie Adams, Roberto Aguire, Tina Arning, Lauren Barnette, Christina Collard, Chelsea Edmundson, Ben Esler, Vanessa Evigan, Alizee Gaillard, Karli Rae GroganShanra J. Kehl
43088NaN0NaN67179itSentenced to life imprisonment for illegal activities, Italian International member Giulio Manieri holds on to his political ideals while struggling against madness in the loneliness of his prison cell.0.225051NaNNaN1972-01-010.090.0ItalianoReleasedNaNSt. Michael Had a Rooster6.03.019720.0Giulio Brogi, Renato Cestiè, Vito Cipolla, Daniele DublinoPaolo Taviani, Vittorio Taviani
43089NaN0Horror, Mystery, Thriller84419enAn unsuccessful sculptor saves a madman named "The Creeper" from drowning. Seeing an opportunity for revenge, he tricks the psycho into murdering his critics.0.222814Universal PicturesUnited States of America1946-03-290.065.0EnglishReleasedMeet...The CREEPER!House of Horrors6.38.019460.0Rondo Hatton, Robert Lowery, Virginia Grey, Bill Goodwin, Martin Kosleck, Alan Napier, Howard Freeman, Virginia Christine, Joan Shawlee, Byron Foulger, Syd SaylorJean Yarbrough
43090NaN0Mystery, Horror390959enIn this true-crime documentary, we delve into the murder spree that was the inspiration for Joe Berlinger's "Book of Shadows: Blair Witch 2".0.076061NaNNaN2000-10-220.045.0EnglishReleasedNaNShadow of the Blair Witch7.02.020000.0Tony Abatemarco, Andre Brooks, Mariclare Costello, Bill Dreggors, Apollo Dukakis, Philip Friedman, James Gleason, Dilva Henry, Bari Hochwald, Wendy Hoffman, John Huck, Rachel Moskowitz, Sandy Mulvihill, Roger Nolan, Chris Parnell, Byrne Piven, Richard Sexton, Rich Williams, Ray XifoBen Rock
43091NaN0Horror289923enA film archivist revisits the story of Rustin Parr, a hermit thought to have murdered seven children while under the possession of the Blair Witch.0.386450Neptune Salad Entertainment, Pirie ProductionsUnited States of America2000-10-030.030.0EnglishReleasedDo you know what happened 50 years before "The Blair Witch Project"?The Burkittsville 77.01.020000.0Monty Bane, Lucy Butler, David Grammer, Bill Dreggors, Frank Pastor, Heather Donahue, Joshua Leonard, Michael C. WilliamsBen Rock
43092NaN0Science Fiction222848enIt's the year 3000 AD. The world's most dangerous women are banished to a remote asteroid 45 million light years from earth. Kira Murphy doesn't belong; wrongfully accused of a crime she did not commit, she's thrown in this interplanetary prison and left to her own defenses. But Kira's a fighter, and soon she finds herself in the middle of a female gang war; where everyone wants a piece of the action... and a piece of her! "Caged Heat 3000" takes the Women-in-Prison genre to a whole new level... and a whole new galaxy!0.661558Concorde-New HorizonsUnited States of America1995-01-010.085.0EnglishReleasedNaNCaged Heat 30003.51.019950.0Lisa Boyle, Kena Land, Zaneta Polard, Don Yanan, Debra K. Beatty, Mark Sikes, Robert J. Ferrelli, Ellyn Dawn Humphreys, Ron Jeremy, Ben RamseyAaron Osborne
43093NaN0Drama, Action, Romance30840enYet another version of the classic epic, with enough variation to make it interesting. The story is the same, but some of the characters are quite different from the usual, in particular Uma Thurman's very special maid Marian. The photography is also great, giving the story a somewhat darker tone.5.683753Westdeutscher Rundfunk (WDR), Working Title Films, 20th Century Fox Television, CanWest Global CommunicationsCanada, Germany, United Kingdom, United States of America1991-05-130.0104.0EnglishReleasedNaNRobin Hood5.726.019910.0Patrick Bergin, Uma Thurman, David Morrissey, Jürgen Prochnow, Jeroen KrabbéJohn Irvin
43094NaN0Drama111109tlAn artist struggles to finish his work while a storyline about a cult plays in his head.0.178241Sine OliviaPhilippines2011-11-170.0360.0NaNReleasedNaNCentury of Birthing9.03.020110.0Angel Aquino, Perry Dizon, Hazel Orencio, Joel Torre, Bart Guingona, Soliman Cruz , Roeder, Angeli Bayani, Dante Perez, Betty Uy-Regala, ModestaLav Diaz
43095NaN0Action, Drama, Thriller67758enWhen one of her hits goes wrong, a professional assassin ends up with a suitcase full of a million dollars belonging to a mob boss ...0.903007American World PicturesUnited States of America2003-08-010.090.0EnglishReleasedA deadly game of wits.Betrayal3.86.020030.0Erika Eleniak, Adam Baldwin, Julie du Page, James Remar, Damian Chapa, Louis Mandylor, Tom Wright, Jeremy Lelliott, James Quattrochi, Jason Widener, Joe Sabatino, Kiko Ellsworth, Don Swayze, Peter Dobson, Darrell DubovskyMark L. Lester
43096NaN0NaN227506enIn a small town live two brothers, one a minister and the other one a hunchback painter of the chapel who lives with his wife. One dreadful and stormy night, a stranger knocks at the door asking for shelter. The stranger talks about all the good things of the earthly life the minister is missing because of his puritanical faith. The minister comes to accept the stranger's viewpoint but it is others who will pay the consequences because the minister will discover the human pleasures thanks to, ehem, his sister- in -law… The tormented minister and his cuckolded brother will die in a strange accident in the chapel and later an infant will be born from the minister's adulterous relationship.0.003503YermolievRussia1917-10-210.087.0NaNReleasedNaNSatan Triumphant0.00.019170.0Iwan Mosschuchin, Nathalie Lissenko, Pavel Pavlov, Aleksandr Chabrov, Vera OrlovaYakov Protazanov

Duplicate rows

Most frequently occurring

belongs_to_collectionbudgetgenresidoriginal_languageoverviewpopularityproduction_companiesproduction_countriesrelease_daterevenueruntimespoken_languagesstatustaglinetitlevote_averagevote_countrelease_yearreturnActorNamesDirectorNames# duplicates
32NaN0Thriller, Mystery141971fiRecovering from a nail gun shot to the head and 13 months of coma, doctor Pekka Valinta starts to unravel the mystery of his past, still suffering from total amnesia.0.411949Filmiteollisuus FineFinland2008-12-260.0108.0suomiReleasedWhich one is the first to return - memory or the murderer?Blackout6.73.020080.0Petteri Summanen, Ismo Kallio, Eppu Salminen, Irina Björklund, Hannu-Pekka Björkman, Jenni Banerjee, Mikko Leppilampi, Lena Meriläinen, Mari Perankoski, Risto KaskilahtiJP Siili9
7Why We Fight0Documentary159849enThe third film of Frank Capra's 'Why We Fight" propaganda film series, dealing with the Nazi conquest of Western Europe in 1940.0.473322NaNUnited States of America1943-01-010.057.0EnglishReleasedNaNWhy We Fight: Divide and Conquer5.01.019430.0Knox Manning, Murray Alper, General Bergeret, Monte Blue, Karl Brandt, Maurice Brierre, Winston Churchill, Ann Codee, Walter Darré, Charles de GaulleFrank Capra, Anatole Litvak4
11NaN0Action, Horror, Science Fiction18440enWhen a comet strikes Earth and kicks up a cloud of toxic dust, hundreds of humans join the ranks of the living dead. But there's bad news for the survivors: The newly minted zombies are hell-bent on eradicating every last person from the planet. For the few human beings who remain, going head to head with the flesh-eating fiends is their only chance for long-term survival. Yet their battle will be dark and cold, with overwhelming odds.1.436085NaNUnited States of America2007-01-010.089.0EnglishReleasedNaNDays of Darkness5.05.020070.0Sabrina Gennarino, Tom EplinJake Kennedy4
12NaN0Adventure, Animation, Drama, Action, Foreign23305enIn feudal India, a warrior (Khan) who renounces his role as the longtime enforcer to a local lord becomes the prey in a murderous hunt through the Himalayan mountains.1.967992FilmfourFrance, Germany, India, United Kingdom2001-09-230.086.0हिन्दीReleasedNaNThe Warrior6.315.020010.0Irrfan Khan, Puru Chibber, Aino Annuddin, Manoj Mishra, Nanhe Khan, Chander Singh, Hemant Maahaor, Mandakini Goswami, Sunita Sharma, Shaukat Baig, Gori Shanker, Prabhuram, Wagaram, Ajai Rohilla, Noor Mani, Sitaram Panchal, Chander Prakash Vyas, Sanjal, Anupam Shyam, Amit Kumar, Damayanti Marfatia, Trilok Singh, Pushpa Negi, Karuna Sarah Davis, Rakesh Mehra, Anuradha Advanti, Ismail Bashey, MadhuAsif Kapadia4
13NaN0Comedy97995enAfter breaking a mirror in his home, superstitious Max tries to avoid situations which could bring bad luck but in doing so, causes himself the worst luck imaginable.0.141558Max Linder ProductionsUnited States of America1921-02-060.062.0EnglishReleasedNaNSeven Years Bad Luck5.64.019210.0Max Linder, Alta Allen, Ralph McCullough, Betty K. Peterson, F.B. Crayne, Chance Ward, Hugh Saxon, Thelma Percy, C.E. Anderson, Lola Gonzales, Harry Mann, Joe MartinMax Linder4
14NaN0Comedy, Drama11115enAs an ex-gambler teaches a hot-shot college kid some things about playing cards, he finds himself pulled into the world series of poker, where his protégé is his toughest competition.6.880365Andertainment Group, Crescent City Pictures, Tag EntertainmentUnited States of America2008-01-290.085.0EnglishReleasedNaNDeal5.222.020080.0Burt Reynolds, Bret Harrison, Shannon Elizabeth, Maria Mason, Jennifer Tilly, Gary Grubbs, Charles Durning, Caroline Mckinley, Brandon Ray Olive, Jon Eyez, J.D. EvermoreGil Cates Jr.4
15NaN0Comedy, Drama265189svWhile holidaying in the French Alps, a Swedish family deals with acts of cowardliness as an avalanche breaks out.12.165685Motlys, Coproduction Office, Film i VästNorway, Sweden, France2014-08-151359497.0118.0Français, Norsk, svenska, EnglishReleasedNaNForce Majeure6.8255.020140.0Lisa Loven Kongsli, Johannes Bah Kuhnke, Clara Wettergren, Vincent Wettergren, Brady Corbet, Kristofer Hivju, Fanni Metelius, Karin Myrenberg, Johannes MoustosRuben Östlund4
16NaN0Crime, Drama, Thriller5511frHitman Jef Costello is a perfectionist who always carefully plans his murders and who never gets caught.9.091288Fida cinematografica, Compagnie Industrielle et Commerciale Cinématographique (CICC), TC Productions, FilmelFrance, Italy1967-10-2539481.0105.0FrançaisReleasedThere is no solitude greater than that of the SamuraiLe Samouraï7.9187.019670.0Alain Delon, François Périer, Nathalie Delon, Cathy Rosier, Catherine Jourdan, Jacques Leroy, Michel Boisrond, Robert Favart, Jean-Pierre Posier, Roger Fradet, Carlo Nell, Robert Rondo, André Salgues, André Thorent, Jacques Deschamps, Georges Casati, Jacques Léonard, Pierre Vaudier, Maurice Magalon, Gaston Meunier, Jean Gold, Georges Billy, Ari Aricardi, Guy Bonnafoux, Humberto Catalano, Carl Lechner, Maria ManevaJean-Pierre Melville4
19NaN0Drama25541daFormer Danish servicemen Lars and Jimmy are thrown together while training in a neo-Nazi group. Moving from hostility through grudging admiration to friendship and finally passion, events take a darker turn when their illicit relationship is uncovered.2.587911NaNSweden, Denmark2009-10-210.090.0DanskReleasedNaNBrotherhood7.121.020090.0Nicolas Bro, David Dencik, Claus Flygare, Michael Grønnemose, Hanne Hedelund, Thure Lindhardt, Mads Rømer Brolin-TaniNicolo Donato4
24NaN0Drama, Comedy168538enIn Zola's Paris, an ingenue arrives at a tony bordello: she's Nana, guileless, but quickly learning to use her erotic innocence to get what she wants. She's an actress for a soft-core filmmaker and soon is the most popular courtesan in Paris, parlaying this into a house, bought for her by a wealthy banker. She tosses him and takes up with her neighbor, a count of impeccable rectitude, and with the count's impressionable son. The count is soon fetching sticks like a dog and mortgaging his lands to satisfy her whims.1.276602Cannon Group, Metro-Goldwyn-Mayer (MGM)NaN1983-06-130.092.0NaNReleasedNaNNana, the True Key of Pleasure4.73.019830.0Katya Berger, Jean-Pierre Aumont, Yehuda Efroni, Yehuda Efroni, Massimo Serato, Debra Berger, Shirin Taylor, Annie Belle, Paul Müller, Marcus Beresford, Robert Bridges, Tom FelleghyDan Wolman4